Welcome to InfluxDB IOx: InfluxData’s New Storage Engine

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • influxdb_iox

    Discontinued Pronounced (influxdb eye-ox), short for iron oxide. This is the new core of InfluxDB written in Rust on top of Apache Arrow.

  • Just want to say congratulations to the team!

    2 years and 9,500+ commits is a hell of a feat.

    https://github.com/influxdata/influxdb_iox

  • datafusion

    Apache DataFusion SQL Query Engine

  • Just wanted to give a shout out to Apache DataFusion[0] that IOx relies on a lot (and contributes to as well!).

    It's a framework for writing query engines in Rust that takes care of a lot of heavy lifting around parsing SQL, type casting, constructing and transforming query plans and optimizing them. It's pluggable, making it easy to write custom data sources, optimizer rules, query nodes etc.

    It's has very good single-node performance (there's even a way to compile it with SIMD support) and Ballista [1] extends that to build it into a distributed query engine.

    Plenty of other projects use it besides IOx, including VegaFusion, ROAPI, Cube.js's preaggregation store. We're heavily using it to build Seafowl [2], an analytical database that's optimized for running SQL queries directly from the user's browser (caching, CDNs, low latency, some WASM support, all that fun stuff).

    [0] https://github.com/apache/arrow-datafusion

    [1] https://github.com/apache/arrow-ballista

    [2] https://github.com/splitgraph/seafowl

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • datafusion-ballista

    Apache Arrow Ballista Distributed Query Engine

  • Just wanted to give a shout out to Apache DataFusion[0] that IOx relies on a lot (and contributes to as well!).

    It's a framework for writing query engines in Rust that takes care of a lot of heavy lifting around parsing SQL, type casting, constructing and transforming query plans and optimizing them. It's pluggable, making it easy to write custom data sources, optimizer rules, query nodes etc.

    It's has very good single-node performance (there's even a way to compile it with SIMD support) and Ballista [1] extends that to build it into a distributed query engine.

    Plenty of other projects use it besides IOx, including VegaFusion, ROAPI, Cube.js's preaggregation store. We're heavily using it to build Seafowl [2], an analytical database that's optimized for running SQL queries directly from the user's browser (caching, CDNs, low latency, some WASM support, all that fun stuff).

    [0] https://github.com/apache/arrow-datafusion

    [1] https://github.com/apache/arrow-ballista

    [2] https://github.com/splitgraph/seafowl

  • seafowl

    Analytical database for data-driven Web applications 🪶

  • Just wanted to give a shout out to Apache DataFusion[0] that IOx relies on a lot (and contributes to as well!).

    It's a framework for writing query engines in Rust that takes care of a lot of heavy lifting around parsing SQL, type casting, constructing and transforming query plans and optimizing them. It's pluggable, making it easy to write custom data sources, optimizer rules, query nodes etc.

    It's has very good single-node performance (there's even a way to compile it with SIMD support) and Ballista [1] extends that to build it into a distributed query engine.

    Plenty of other projects use it besides IOx, including VegaFusion, ROAPI, Cube.js's preaggregation store. We're heavily using it to build Seafowl [2], an analytical database that's optimized for running SQL queries directly from the user's browser (caching, CDNs, low latency, some WASM support, all that fun stuff).

    [0] https://github.com/apache/arrow-datafusion

    [1] https://github.com/apache/arrow-ballista

    [2] https://github.com/splitgraph/seafowl

  • faunadb-js

    Javascript driver for FaunaDB v4

  • Great question! With Seafowl, the idea is different from what the modern data stack addresses. It's trying to simplify public-facing Web-based visualizations: apps that need to run analytical queries on large datasets and can be accessed by users all around the world. This is why we made the query API easily cacheable by CDNs and Seafowl itself easy to deploy at the edge, e.g. with Fly.io.

    It's a fairly different use case from DuckDB (query execution for Web applications vs fast embedded analytical database for notebooks) and the rest of the modern data stack (which mostly is about analytics internal to a company). Just to clarify, we're not related to IOx directly (only via us both using Apache DataFusion).

    If we had to place Seafowl _inside_ of the modern data stack, it'd be mostly a warehouse, but one that is optimized for being queried from the Internet, rather than by a limited set of internal users. Or, a potential use case could be extracting internal data from your warehouse to Seafowl in order to build public applications that use it.

    We don't currently ship a Web front-end and so can't serve as a replacement to Superset: it's exposed to the developer as an HTTP API that can be queried directly from the end user's Web browser. But we have some ideas around a frontend component: some kind of a middleware, where the Web app can pre-declare the queries it will need to run at build time and we can compute some pre-aggregations to speed those up at runtime. Currently we recommend querying it with Observable [0] for an end-to-end query + visualization experience (or use a different viz library like d3/Vega).

    Re: the second question about Splitgraph for a data lake, the intention behind Splitgraph is to orchestrate all those tools and there the use case is indeed the modern data stack in a box. It's kind of similar to dbt Labs's Sinter [1] which was supposed to be the end-to-end data platform before they focused on dbt and dbt Cloud instead: being able to run Airbyte ingestion, dbt transformations, be a data warehouse (using PostgreSQL and a columnar store extension), let users organize and discover data at the same time. There's a lot of baggage in Splitgraph though, as we moved through a few iterations of the product (first Git/Docker for data, then a platform for the modern data stack). Currently we're thinking about how to best integrate Splitgraph and Seafowl in order to build a managed pay-as-you-go Seafowl, kind of like Fauna [2] for analytics.

    Hope this helps!

    [0] https://observablehq.com/@seafowl/interactive-visualization-...

    [1] https://www.getdbt.com/blog/whats-in-a-name/

    [2] https://fauna.com/

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts