Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 6 Java Datalake Projects
-
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
-
starrocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Project mention: Trino: Fast distributed SQL query engine for big data analytics | news.ycombinator.com | 2024-03-19
Project mention: A MySQL compatible database engine written in pure Go | news.ycombinator.com | 2024-04-09tidb has been around for a while, it is distributed, written in Go and Rust, and MySQL compatible. https://github.com/pingcap/tidb
Somewhat relatedly, StarRocks is also MySQL compatible, written in Java and C++, but it's tackling OLAP use-cases. https://github.com/StarRocks/starrocks
Project mention: Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog | dev.to | 2023-12-18Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake.
Project mention: Linkedin OpenHouse: Control Plane for Tables in Data Lakehouses | news.ycombinator.com | 2024-03-11
Java Datalake related posts
- Linkedin OpenHouse: Control Plane for Tables in Data Lakehouses
- For those of you with Lakehouse Architectures, how do you handle duplicate records?
- AWS ACID data lakehouse
- Data n00b looking for guidance on how to setup data lake/warehouse
- apache/hudi: Upserts, Deletes And Incremental Processing on Big Data.
- Big Data file formats
- What do you use for Data versioning?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Apr 2024
Index
What are some of the best open-source Datalake projects in Java? This list will help you:
Project | Stars | |
---|---|---|
1 | Trino | 9,552 |
2 | starrocks | 7,764 |
3 | hudi | 5,053 |
4 | LakeSoul | 2,301 |
5 | zingg | 877 |
6 | openhouse | 242 |
Sponsored