Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free. Learn more →
Top 8 Java Bigdata Projects
-
shardingsphere
Distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database.
Opposite to what the documentation tells, the full prefix is jdbc:shardingsphere:absolutepath. I've opened a PR to fix the documentation.
-
Structured, Semi-structured and Unstructured can be stored in one single format, a lakehouse storage format like Delta, Iceberg or Hudi (assuming those don't require low-latency SLAs like subsecond).
-
Sonar
Write Clean Java Code. Always.. Sonar helps you commit clean code every time. With over 600 unique rules to find Java bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
Project mention: How do you update an existing avro schema using apache avro SchemaBuilder? | /r/codehunter | 2023-06-09
I am testing a new schema registry which loads and retrieves different kinds of avro schemas. In the process of testing, I need to create a bunch of different types of avro schemas. As it involves a lot of permutations, I decided to create the schema programmatically.I am using the apache avro SchemaBuilder to do so.
-
odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Project mention: OpenDataDiscovery 0.15 with Data Deprecation and Metadata Stale | news.ycombinator.com | 2023-08-04 -
dataCompare
big data comparison and data profiling platform: low code,data comparison and data profiling
Project mention: Design and practice of open source big data comparison platform | /r/bigdata | 2022-12-14 -
big-data-pipeline-lambda-arch
A full big data pipeline (Lambda Architecture) with Spark, Kafka, HDFS and Cassandra.
-
hadoopcryptoledger
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
-
InfluxDB
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
-
rapiddweller-benerator-ce
BENERATOR is a leading software solution to generate, obfuscate, pseudonymize and migrate data for development, testing, and training purposes with a model-driven approach.
Java Bigdata related posts
- For those of you with Lakehouse Architectures, how do you handle duplicate records?
- AWS ACID data lakehouse
- hadoopcryptoledger: NEW Data - star count:139.0
- hadoopcryptoledger: NEW Data - star count:139.0
- hadoopcryptoledger: NEW Data - star count:139.0
- hadoopcryptoledger: NEW Data - star count:139.0
- hadoopcryptoledger: NEW Data - star count:139.0
-
A note from our sponsor - Mergify
blog.mergify.com | 26 Sep 2023
Index
What are some of the best open-source Bigdata projects in Java? This list will help you:
Project | Stars | |
---|---|---|
1 | shardingsphere | 18,805 |
2 | hudi | 4,504 |
3 | Apache Avro | 2,578 |
4 | odd-platform | 945 |
5 | dataCompare | 205 |
6 | big-data-pipeline-lambda-arch | 153 |
7 | hadoopcryptoledger | 140 |
8 | rapiddweller-benerator-ce | 114 |