Java hudi

Open-source Java projects categorized as hudi

Top 3 Java hudi Projects

  • doris

    Apache Doris is an easy-to-use, high performance and unified analytics database.

  • Project mention: Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis | dev.to | 2024-03-27

    As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.

  • starrocks

    StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.

  • Project mention: A MySQL compatible database engine written in pure Go | news.ycombinator.com | 2024-04-09

    tidb has been around for a while, it is distributed, written in Go and Rust, and MySQL compatible. https://github.com/pingcap/tidb

    Somewhat relatedly, StarRocks is also MySQL compatible, written in Java and C++, but it's tackling OLAP use-cases. https://github.com/StarRocks/starrocks

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • hudi

    Upserts, Deletes And Incremental Processing on Big Data.

  • Project mention: Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog | dev.to | 2023-12-18

    Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Java hudi related posts

  • Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog

    4 projects | dev.to | 18 Dec 2023
  • Log Analysis: Elasticsearch VS Apache Doris

    1 project | dev.to | 16 Oct 2023
  • For those of you with Lakehouse Architectures, how do you handle duplicate records?

    1 project | /r/dataengineering | 16 Apr 2023
  • AWS ACID data lakehouse

    1 project | /r/dataengineering | 30 Jan 2023
  • Data n00b looking for guidance on how to setup data lake/warehouse

    1 project | /r/dataengineering | 29 Oct 2022
  • apache/hudi: Upserts, Deletes And Incremental Processing on Big Data.

    1 project | /r/devopsish | 20 Oct 2022
  • Big Data file formats

    1 project | /r/apachespark | 13 Jun 2022
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 4 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source hudi projects in Java? This list will help you:

Project Stars
1 doris 11,389
2 starrocks 7,789
3 hudi 5,085

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com