Java Bigdata

Open-source Java projects categorized as Bigdata | Edit details

Top 5 Java Bigdata Projects

  • shardingsphere

    Building a Standard Layer & Ecosystem Above Heterogeneous Databases

    Project mention: Learn how to use ShardingSphere version 5.0 in a practical scenario case integrating data sharding, read/write splitting, and data encryption & decryption. Details below, no paywall: | | 2021-12-23
  • hudi

    Upserts, Deletes And Incremental Processing on Big Data.

    Project mention: Apache Hudi and Glue Catalog | | 2021-11-01

    Found this very similar discussion, differing only in that it's EMR which is what Glue is in the background anyway. Logs attached in the discussion are showing an error thrown by Glue when Hudi tries to perform an ALTER TABLE CASCADE because Glue metastore doesn't support cascade. The linked Jira in the discussion claims that more recent EMR versions resolve this issue but there are comments as of May this year claiming it's still unresolved. It's unclear to me which EMR version Glue 2.0 uses. I'll dig into this tomorrow and also try with Glue 3.0 as that may use a more recent version of EMR that's resolved the issue.

  • OPS

    OPS - Build and Run Open Source Unikernels. Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.

  • Apache Avro

    Apache Avro is a data serialization system.

    Project mention: Serialization | | 2022-01-18

    When serializing a value, we convert it to a different sequence of bytes. This sequence is often a human-readable string (all the bytes can be read and interpreted by humans as text), but not necessarily. The serialized format can be binary. Binary data (example: an image) is still bytes, but makes use of non-text characters, so it looks like gibberish in a text editor. Binary formats won't make sense unless deserialized by an appropriate program. An example of a human-readable serialization format is JSON. Examples of binary formats are Apache Avro, Protobuf.

  • OpenMetadata

    Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

    Project mention: How to show recent GitHub activities on your profile readme | | 2022-01-13

    # Recent Activity :zap: 1. 🎉 Merged PR [#2197]( in [open-metadata/OpenMetadata]( 2. ❗️ Closed issue [#2040]( in [open-metadata/OpenMetadata]( 3. ❗️ Closed issue [#2028]( in [open-metadata/OpenMetadata]( 4. ❗️ Closed issue [#2156]( in [open-metadata/OpenMetadata]( 5. 🗣 Commented on [#2156]( in [open-metadata/OpenMetadata]( 6. 🎉 Merged PR [#2154]( in [open-metadata/OpenMetadata]( 7. ❗️ Closed issue [#2087]( in [open-metadata/OpenMetadata]( 8. ❗️ Opened issue [#2156]( in [open-metadata/OpenMetadata]( 9. ❗️ Opened issue [#2147]( in [open-metadata/OpenMetadata]( 10. ❗️ Closed issue [#1876]( in [open-metadata/OpenMetadata](

  • lambda-arch

    Applying Lambda Architecture with Spark, Kafka, and Cassandra.

    Project mention: Lambda Architecture: How to Build a Big Data Pipeline | | 2021-03-05

    GitHub project

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-01-18.

Java Bigdata related posts


What are some of the best open-source Bigdata projects in Java? This list will help you:

Project Stars
1 shardingsphere 15,161
2 hudi 2,702
3 Apache Avro 2,033
4 OpenMetadata 662
5 lambda-arch 125
Find remote jobs at our new job board There are 30 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Less time debugging, more time building
Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.