SaaSHub helps you find the best software and product alternatives Learn more →
Top 6 Java iceberg Projects
-
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
starrocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
Project mention: Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis | dev.to | 2024-03-27As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.
Project mention: Trino: Fast distributed SQL query engine for big data analytics | news.ycombinator.com | 2024-03-19
Project mention: A MySQL compatible database engine written in pure Go | news.ycombinator.com | 2024-04-09tidb has been around for a while, it is distributed, written in Go and Rust, and MySQL compatible. https://github.com/pingcap/tidb
Somewhat relatedly, StarRocks is also MySQL compatible, written in Java and C++, but it's tackling OLAP use-cases. https://github.com/StarRocks/starrocks
Project mention: Iceberg won the table format war: But not in the way you thought it might | /r/dataengineering | 2023-07-06
Project mention: A deep dive into the concept and world of Apache Iceberg Catalogs | dev.to | 2024-03-01Nessie is an innovative open-source catalog that extends beyond the traditional catalog capabilities in the Apache Iceberg ecosystem, introducing git-like features to data management. This catalog not only tracks table metadata but also allows users to capture commits at a holistic level, enabling advanced operations such as multi-table transactions, rollbacks, branching, and tagging. These features provide a new layer of flexibility and control over data changes, resembling version control systems in software development.
Project mention: Linkedin OpenHouse: Control Plane for Tables in Data Lakehouses | news.ycombinator.com | 2024-03-11
Java iceberg related posts
- A deep dive into the concept and world of Apache Iceberg Catalogs
- Iceberg won the table format war: But not in the way you thought it might
- Why is Hive Metastore everywhere? (Especially Iceberg)
- Lakehouse using AWS Athena on Iceberg Concerns
- apache/iceberg: Apache Iceberg
- What are the main things I need to know to be hired as a Java developer?
- Have you used Athena Iceberg for small(-ish) data?
-
A note from our sponsor - SaaSHub
www.saashub.com | 27 Apr 2024
Index
What are some of the best open-source iceberg projects in Java? This list will help you:
Project | Stars | |
---|---|---|
1 | doris | 11,314 |
2 | Trino | 9,552 |
3 | starrocks | 7,764 |
4 | iceberg | 5,508 |
5 | nessie | 831 |
6 | openhouse | 242 |
Sponsored