Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 11 datafusion Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
iox-community
Community InfluxDB 3.0 "IOx" static builds + containers + Examples for Developers & Integrators. Experiment with low-cost storage, unlimited cardinality and FlightSQL APIs
Python's Substrait seems like the biggest/most-used competitor-ish out there. I'd love some compare & contrast; my sense is that Substrait has a smaller ambition, and more wants to be a language for talking about execution rather than a full on execution engine. https://github.com/substrait-io/substrait
We can also see from the DataFusion discussion that they too see themselves as a bit of a Velox competitor. https://github.com/apache/arrow-datafusion/discussions/6441
Project mention: Show HN: Hashquery, a Python library for defining reusable analysis | news.ycombinator.com | 2024-04-23I really don't understand the appeal of dbt vs a proper programming language. The templating approach leads to massive spaghetti. I look forward to trying out something like Ibis [0]
0: https://ibis-project.org/
Project mention: Full-fledged APIs for slowly moving datasets without writing code | news.ycombinator.com | 2023-10-25
Project mention: Apache Arrow DataFusion Comet Spark Accelerator | news.ycombinator.com | 2024-03-07
Project mention: Gcsfuse: A user-space file system for interacting with Google Cloud Storage | news.ycombinator.com | 2023-09-06In case you're interested in scale-to-zero database hosting, a few months ago I paired gcsfuse with Seafowl [0][1], an early stage open source database written in Rust. Was a lot of fun balancing tradeoffs that are usually not possible with classical databases e.g. Postgres. Thank you gcsfuse contributors.
[0] https://seafowl.io
Project mention: InfluxDB 3.0 Infinite Observability with qryn-iox | news.ycombinator.com | 2023-09-17Watch out for the AGPL minio <https://github.com/metrico/iox-community/blob/155a14bb5e8e32...> the almost certainly AGPL grafana <https://github.com/grafana/grafana/blob/v10.1.1/LICENSE> and always eye anyone who uses :latest images with healthy suspicion
That said, influx_iox itself appears to be Apache 2 (and/or MIT?) https://github.com/influxdata/influxdb_iox/blob/main/LICENSE...
datafusion related posts
-
Apache Arrow DataFusion Comet Spark Accelerator
-
Transforming Postgres into a Fast OLAP Database
-
Apache Arrow DataFusion
-
InfluxDB 3.0 Infinite Observability with qryn-iox
-
InfluxDB Cloud shuts down in Belgium; some weren't notified before data deletion
-
Show HN: Serverless OLAP with Seafowl and GCP
-
Polars: Computing a new column from multiple columns - there must be a better way
-
A note from our sponsor - InfluxDB
www.influxdata.com | 2 May 2024
Index
What are some of the best open-source datafusion projects? This list will help you:
Project | Stars | |
---|---|---|
1 | datafusion | 5,020 |
2 | ibis | 4,208 |
3 | roapi | 3,080 |
4 | LakeSoul | 2,307 |
5 | datafusion-comet | 417 |
6 | seafowl | 355 |
7 | kamu-cli | 277 |
8 | datafusion-objectstore-s3 | 57 |
9 | seafowl-gcsfuse | 39 |
10 | iox-community | 35 |
11 | awesome-pandas-alternatives | 29 |
Sponsored