Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 15 delta-lake Open-Source Projects
-
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
starrocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
-
delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)
-
LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
incubator-xtable
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
-
delta-buddy
Introducing Delta-Buddy: Your ultimate Delta Lake companion! 🚀 Streamline your data journey with an AI-powered chatbot. Ask Delta-Buddy anything about your Delta Lake.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis | dev.to | 2024-03-27As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.
Project mention: Trino: Fast distributed SQL query engine for big data analytics | news.ycombinator.com | 2024-03-19
Project mention: A MySQL compatible database engine written in pure Go | news.ycombinator.com | 2024-04-09tidb has been around for a while, it is distributed, written in Go and Rust, and MySQL compatible. https://github.com/pingcap/tidb
Somewhat relatedly, StarRocks is also MySQL compatible, written in Java and C++, but it's tackling OLAP use-cases. https://github.com/StarRocks/starrocks
Delta is pretty great, let's you do upserts into tables in DataBricks much easier than without it.
I think the website is here: https://delta.io
Project mention: Full-fledged APIs for slowly moving datasets without writing code | news.ycombinator.com | 2023-10-25
Project mention: Delta-rs – a Rust-based implementation of deltalake | news.ycombinator.com | 2024-04-08
Project mention: Gcsfuse: A user-space file system for interacting with Google Cloud Storage | news.ycombinator.com | 2023-09-06In case you're interested in scale-to-zero database hosting, a few months ago I paired gcsfuse with Seafowl [0][1], an early stage open source database written in Rust. Was a lot of fun balancing tradeoffs that are usually not possible with classical databases e.g. Postgres. Thank you gcsfuse contributors.
[0] https://seafowl.io
Project mention: Debugging Python Code in Amazon SageMaker Locally Using Visual Studio Code and PyCharm: A Step-by-Step Guide | dev.to | 2023-11-15git clone https://github.com/aws-samples/amazon-sagemaker-local-mode/ cd amazon-sagemaker-local-mode/general_pipeline_local_debug python3 -m venv .venv source .venv/bin/activate pip install jupyter jupyter lab
Project mention: A ChatBot with open source LLM to ask questions on your Delta Lake | news.ycombinator.com | 2023-06-18
delta-lake related posts
-
Delta-rs – a Rust-based implementation of deltalake
-
Delta Lake vs. Parquet: A Comparison
-
[D] Is there other better data format for LLM to generate structured data?
-
OneTable is now live | Table format interoperability is not a dream anymore
-
Delta vs Iceberg: make love not war
-
Azure data lake - Data Share
-
Databricks Strikes $1.3B Deal for Generative AI Startup MosaicML
-
A note from our sponsor - InfluxDB
www.influxdata.com | 8 May 2024
Index
What are some of the best open-source delta-lake projects? This list will help you:
Project | Stars | |
---|---|---|
1 | doris | 11,389 |
2 | Trino | 9,597 |
3 | starrocks | 7,789 |
4 | delta | 6,919 |
5 | roapi | 3,087 |
6 | delta-rs | 1,833 |
7 | LearningSparkV2 | 1,095 |
8 | delta-sharing | 676 |
9 | incubator-xtable | 692 |
10 | seafowl | 358 |
11 | amazon-sagemaker-local-mode | 230 |
12 | delta-sharing-rs | 70 |
13 | delta-go | 34 |
14 | delta-buddy | 9 |
15 | delta-fetch | 1 |
Sponsored