Top 5 Java delta-lake Projects
-
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
starrocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
-
incubator-xtable
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Project mention: Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis | dev.to | 2024-03-27As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.
Project mention: Trino: Fast distributed SQL query engine for big data analytics | news.ycombinator.com | 2024-03-19
Project mention: A MySQL compatible database engine written in pure Go | news.ycombinator.com | 2024-04-09tidb has been around for a while, it is distributed, written in Go and Rust, and MySQL compatible. https://github.com/pingcap/tidb
Somewhat relatedly, StarRocks is also MySQL compatible, written in Java and C++, but it's tackling OLAP use-cases. https://github.com/StarRocks/starrocks
Java delta-lake related posts
Index
What are some of the best open-source delta-lake projects in Java? This list will help you:
Project | Stars | |
---|---|---|
1 | doris | 11,314 |
2 | Trino | 9,552 |
3 | starrocks | 7,764 |
4 | incubator-xtable | 678 |
5 | delta-fetch | 1 |
Sponsored