Welcome to InfluxDB IOx: InfluxData’s New Storage Engine

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

influxdb_iox

14 1,803 9.9 Rust

Discontinued Pronounced (influxdb eye-ox), short for iron oxide. This is the new core of InfluxDB written in Rust on top of Apache Arrow.

Just want to say congratulations to the team!
2 years and 9,500+ commits is a hell of a feat.
https://github.com/influxdata/influxdb_iox

datafusion

55 5,020 9.9 Rust

Apache DataFusion SQL Query Engine

Just wanted to give a shout out to Apache DataFusion[0] that IOx relies on a lot (and contributes to as well!).
It's a framework for writing query engines in Rust that takes care of a lot of heavy lifting around parsing SQL, type casting, constructing and transforming query plans and optimizing them. It's pluggable, making it easy to write custom data sources, optimizer rules, query nodes etc.
It's has very good single-node performance (there's even a way to compile it with SIMD support) and Ballista [1] extends that to build it into a distributed query engine.
Plenty of other projects use it besides IOx, including VegaFusion, ROAPI, Cube.js's preaggregation store. We're heavily using it to build Seafowl [2], an analytical database that's optimized for running SQL queries directly from the user's browser (caching, CDNs, low latency, some WASM support, all that fun stuff).
[0] https://github.com/apache/arrow-datafusion
[1] https://github.com/apache/arrow-ballista
[2] https://github.com/splitgraph/seafowl

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
datafusion-ballista

12 1,275 8.4 Rust

Apache Arrow Ballista Distributed Query Engine

Just wanted to give a shout out to Apache DataFusion[0] that IOx relies on a lot (and contributes to as well!).
It's a framework for writing query engines in Rust that takes care of a lot of heavy lifting around parsing SQL, type casting, constructing and transforming query plans and optimizing them. It's pluggable, making it easy to write custom data sources, optimizer rules, query nodes etc.
It's has very good single-node performance (there's even a way to compile it with SIMD support) and Ballista [1] extends that to build it into a distributed query engine.
Plenty of other projects use it besides IOx, including VegaFusion, ROAPI, Cube.js's preaggregation store. We're heavily using it to build Seafowl [2], an analytical database that's optimized for running SQL queries directly from the user's browser (caching, CDNs, low latency, some WASM support, all that fun stuff).
[0] https://github.com/apache/arrow-datafusion
[1] https://github.com/apache/arrow-ballista
[2] https://github.com/splitgraph/seafowl

seafowl

11 353 9.3 Rust

Analytical database for data-driven Web applications 🪶

Just wanted to give a shout out to Apache DataFusion[0] that IOx relies on a lot (and contributes to as well!).
It's a framework for writing query engines in Rust that takes care of a lot of heavy lifting around parsing SQL, type casting, constructing and transforming query plans and optimizing them. It's pluggable, making it easy to write custom data sources, optimizer rules, query nodes etc.
It's has very good single-node performance (there's even a way to compile it with SIMD support) and Ballista [1] extends that to build it into a distributed query engine.
Plenty of other projects use it besides IOx, including VegaFusion, ROAPI, Cube.js's preaggregation store. We're heavily using it to build Seafowl [2], an analytical database that's optimized for running SQL queries directly from the user's browser (caching, CDNs, low latency, some WASM support, all that fun stuff).
[0] https://github.com/apache/arrow-datafusion
[1] https://github.com/apache/arrow-ballista
[2] https://github.com/splitgraph/seafowl

faunadb-js

89 702 4.6 JavaScript

Javascript driver for FaunaDB v4

Great question! With Seafowl, the idea is different from what the modern data stack addresses. It's trying to simplify public-facing Web-based visualizations: apps that need to run analytical queries on large datasets and can be accessed by users all around the world. This is why we made the query API easily cacheable by CDNs and Seafowl itself easy to deploy at the edge, e.g. with Fly.io.
It's a fairly different use case from DuckDB (query execution for Web applications vs fast embedded analytical database for notebooks) and the rest of the modern data stack (which mostly is about analytics internal to a company). Just to clarify, we're not related to IOx directly (only via us both using Apache DataFusion).
If we had to place Seafowl _inside_ of the modern data stack, it'd be mostly a warehouse, but one that is optimized for being queried from the Internet, rather than by a limited set of internal users. Or, a potential use case could be extracting internal data from your warehouse to Seafowl in order to build public applications that use it.
We don't currently ship a Web front-end and so can't serve as a replacement to Superset: it's exposed to the developer as an HTTP API that can be queried directly from the end user's Web browser. But we have some ideas around a frontend component: some kind of a middleware, where the Web app can pre-declare the queries it will need to run at build time and we can compute some pre-aggregations to speed those up at runtime. Currently we recommend querying it with Observable [0] for an end-to-end query + visualization experience (or use a different viz library like d3/Vega).
Re: the second question about Splitgraph for a data lake, the intention behind Splitgraph is to orchestrate all those tools and there the use case is indeed the modern data stack in a box. It's kind of similar to dbt Labs's Sinter [1] which was supposed to be the end-to-end data platform before they focused on dbt and dbt Cloud instead: being able to run Airbyte ingestion, dbt transformations, be a data warehouse (using PostgreSQL and a columnar store extension), let users organize and discover data at the same time. There's a lot of baggage in Splitgraph though, as we moved through a few iterations of the product (first Git/Docker for data, then a platform for the modern data stack). Currently we're thinking about how to best integrate Splitgraph and Seafowl in order to build a managed pay-as-you-go Seafowl, kind of like Fauna [2] for analytics.
Hope this helps!
[0] https://observablehq.com/@seafowl/interactive-visualization-...
[1] https://www.getdbt.com/blog/whats-in-a-name/
[2] https://fauna.com/

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

GlareDB: An open source SQL database to query and analyze distributed data
4 projects | /r/dataengineering | 8 Jun 2023
Velox: Meta's Unified Execution Engine [pdf]
2 projects | news.ycombinator.com | 25 Mar 2024
Fair Benchmarking Considered Difficult (2018) [pdf]
2 projects | news.ycombinator.com | 10 Mar 2024
How moving from Pandas to Polars made me write better code without writing better code
2 projects | dev.to | 5 Mar 2024
StarRocks – sub-second MPP OLAP database for full analytics scenarios
1 project | news.ycombinator.com | 23 Jan 2024

Welcome to InfluxDB IOx: InfluxData’s New Storage Engine

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
SQL Arrow Database Big Data Dataframe
Post date: 26 Oct 2022

influxdb_iox

datafusion

InfluxDB

datafusion-ballista

seafowl

faunadb-js

WorkOS

Related posts

Welcome to InfluxDB IOx: InfluxData’s New Storage Engine

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com SQL Arrow Database Big Data Dataframe Post date: 26 Oct 2022

influxdb_iox

datafusion

InfluxDB

datafusion-ballista

seafowl

faunadb-js

WorkOS

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
SQL Arrow Database Big Data Dataframe
Post date: 26 Oct 2022