Java Bigdata

Open-source Java projects categorized as Bigdata

Top 8 Java Bigdata Projects

  • shardingsphere

    Distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database.

    Project mention: Managing Data Residency - the demo | | 2023-05-25

    Opposite to what the documentation tells, the full prefix is jdbc:shardingsphere:absolutepath. I've opened a PR to fix the documentation.

  • hudi

    Upserts, Deletes And Incremental Processing on Big Data.

    Project mention: The "Big Three's" Data Storage Offerings | /r/dataengineering | 2023-06-15

    Structured, Semi-structured and Unstructured can be stored in one single format, a lakehouse storage format like Delta, Iceberg or Hudi (assuming those don't require low-latency SLAs like subsecond).

  • Sonar

    Write Clean Java Code. Always.. Sonar helps you commit clean code every time. With over 600 unique rules to find Java bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • Apache Avro

    Apache Avro is a data serialization system.

    Project mention: How do you update an existing avro schema using apache avro SchemaBuilder? | /r/codehunter | 2023-06-09

    I am testing a new schema registry which loads and retrieves different kinds of avro schemas. In the process of testing, I need to create a bunch of different types of avro schemas. As it involves a lot of permutations, I decided to create the schema programmatically.I am using the apache avro SchemaBuilder to do so.

  • odd-platform

    First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

    Project mention: OpenDataDiscovery 0.15 with Data Deprecation and Metadata Stale | | 2023-08-04
  • dataCompare

    big data comparison and data profiling platform: low code,data comparison and data profiling

    Project mention: Design and practice of open source big data comparison platform | /r/bigdata | 2022-12-14
  • big-data-pipeline-lambda-arch

    A full big data pipeline (Lambda Architecture) with Spark, Kafka, HDFS and Cassandra.

  • hadoopcryptoledger

    Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

    Project mention: hadoopcryptoledger: NEW Data - star count:139.0 | /r/algoprojects | 2023-01-21
  • InfluxDB

    Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.

  • rapiddweller-benerator-ce

    BENERATOR is a leading software solution to generate, obfuscate, pseudonymize and migrate data for development, testing, and training purposes with a model-driven approach.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-08-04.

Java Bigdata related posts


What are some of the best open-source Bigdata projects in Java? This list will help you:

Project Stars
1 shardingsphere 18,805
2 hudi 4,504
3 Apache Avro 2,578
4 odd-platform 945
5 dataCompare 205
6 big-data-pipeline-lambda-arch 153
7 hadoopcryptoledger 140
8 rapiddweller-benerator-ce 114
Updating dependencies is time-consuming.
Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free.