Java Datalake

Open-source Java projects categorized as Datalake

Top 6 Java Datalake Projects

  • Trino

    Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

  • Project mention: Trino: Fast distributed SQL query engine for big data analytics | news.ycombinator.com | 2024-03-19
  • starrocks

    StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.

  • Project mention: A MySQL compatible database engine written in pure Go | news.ycombinator.com | 2024-04-09

    tidb has been around for a while, it is distributed, written in Go and Rust, and MySQL compatible. https://github.com/pingcap/tidb

    Somewhat relatedly, StarRocks is also MySQL compatible, written in Java and C++, but it's tackling OLAP use-cases. https://github.com/StarRocks/starrocks

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • hudi

    Upserts, Deletes And Incremental Processing on Big Data.

  • Project mention: Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog | dev.to | 2023-12-18

    Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake.

  • LakeSoul

    LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

  • zingg

    Scalable identity resolution, entity resolution, data mastering and deduplication using ML

  • openhouse

    Open Control Plane for Tables in Data Lakehouse

  • Project mention: Linkedin OpenHouse: Control Plane for Tables in Data Lakehouses | news.ycombinator.com | 2024-03-11
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Java Datalake related posts

Index

What are some of the best open-source Datalake projects in Java? This list will help you:

Project Stars
1 Trino 9,552
2 starrocks 7,764
3 hudi 5,053
4 LakeSoul 2,301
5 zingg 877
6 openhouse 242

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com