datafusion

Open-source projects categorized as datafusion

Top 11 datafusion Open-Source Projects

  • datafusion

    Apache DataFusion SQL Query Engine

  • Project mention: Velox: Meta's Unified Execution Engine [pdf] | news.ycombinator.com | 2024-03-25

    Python's Substrait seems like the biggest/most-used competitor-ish out there. I'd love some compare & contrast; my sense is that Substrait has a smaller ambition, and more wants to be a language for talking about execution rather than a full on execution engine. https://github.com/substrait-io/substrait

    We can also see from the DataFusion discussion that they too see themselves as a bit of a Velox competitor. https://github.com/apache/arrow-datafusion/discussions/6441

  • ibis

    the portable Python dataframe library

  • Project mention: Show HN: Hashquery, a Python library for defining reusable analysis | news.ycombinator.com | 2024-04-23

    I really don't understand the appeal of dbt vs a proper programming language. The templating approach leads to massive spaghetti. I look forward to trying out something like Ibis [0]

    0: https://ibis-project.org/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • roapi

    Create full-fledged APIs for slowly moving datasets without writing a single line of code.

  • Project mention: Full-fledged APIs for slowly moving datasets without writing code | news.ycombinator.com | 2023-10-25
  • LakeSoul

    LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

  • datafusion-comet

    Apache DataFusion Comet Spark Accelerator

  • Project mention: Apache Arrow DataFusion Comet Spark Accelerator | news.ycombinator.com | 2024-03-07
  • seafowl

    Analytical database for data-driven Web applications 🪶

  • Project mention: Gcsfuse: A user-space file system for interacting with Google Cloud Storage | news.ycombinator.com | 2023-09-06

    In case you're interested in scale-to-zero database hosting, a few months ago I paired gcsfuse with Seafowl [0][1], an early stage open source database written in Rust. Was a lot of fun balancing tradeoffs that are usually not possible with classical databases e.g. Postgres. Thank you gcsfuse contributors.

    [0] https://seafowl.io

  • kamu-cli

    New generation decentralized data lake and a streaming data pipeline

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • datafusion-objectstore-s3

    S3 as an ObjectStore for DataFusion

  • seafowl-gcsfuse

    Scale to zero Seafowl hosting with Cloud Run

  • Project mention: Show HN: Serverless OLAP with Seafowl and GCP | /r/hypeurls | 2023-06-06
  • iox-community

    Community InfluxDB 3.0 "IOx" static builds + containers + Examples for Developers & Integrators. Experiment with low-cost storage, unlimited cardinality and FlightSQL APIs

  • Project mention: InfluxDB 3.0 Infinite Observability with qryn-iox | news.ycombinator.com | 2023-09-17

    Watch out for the AGPL minio <https://github.com/metrico/iox-community/blob/155a14bb5e8e32...> the almost certainly AGPL grafana <https://github.com/grafana/grafana/blob/v10.1.1/LICENSE> and always eye anyone who uses :latest images with healthy suspicion

    That said, influx_iox itself appears to be Apache 2 (and/or MIT?) https://github.com/influxdata/influxdb_iox/blob/main/LICENSE...

  • awesome-pandas-alternatives

    Awesome list of alternative dataframe libraries in Python.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

datafusion related posts

  • Apache Arrow DataFusion Comet Spark Accelerator

    1 project | news.ycombinator.com | 7 Mar 2024
  • Transforming Postgres into a Fast OLAP Database

    3 projects | news.ycombinator.com | 7 Feb 2024
  • Apache Arrow DataFusion

    1 project | news.ycombinator.com | 1 Oct 2023
  • InfluxDB 3.0 Infinite Observability with qryn-iox

    3 projects | news.ycombinator.com | 17 Sep 2023
  • InfluxDB Cloud shuts down in Belgium; some weren't notified before data deletion

    1 project | news.ycombinator.com | 10 Jul 2023
  • Show HN: Serverless OLAP with Seafowl and GCP

    1 project | /r/hypeurls | 6 Jun 2023
  • Polars: Computing a new column from multiple columns - there must be a better way

    1 project | /r/rust | 4 May 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 2 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source datafusion projects? This list will help you:

Project Stars
1 datafusion 5,020
2 ibis 4,208
3 roapi 3,080
4 LakeSoul 2,307
5 datafusion-comet 417
6 seafowl 355
7 kamu-cli 277
8 datafusion-objectstore-s3 57
9 seafowl-gcsfuse 39
10 iox-community 35
11 awesome-pandas-alternatives 29

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com