Java Hadoop

Open-source Java projects categorized as Hadoop

Top 17 Java Hadoop Projects

  1. APIJSON

    🏆 实时 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构 🏆 Real-Time coding-free, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and the returned JSON of API can be customized by Frontend(Client) users

    Project mention: Top 15 Open-Source Low-Code Projects with the Most GitHub Stars | dev.to | 2024-07-18

    GitHub https://github.com/Tencent/APIJSON GitHub Stars 16.9k Most Recent Update on GitHub 2 days ago Open Source License Apache 2.0 Number of Active Contributors This Year 6 Acceptance of External PRs Yes Official Website http://apijson.cn/ Documentation https://apijsondocs.readthedocs.io/en/latest/

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. Presto

    The official home of the Presto distributed SQL query engine for big data

    Project mention: Using IRIS and Presto for high-performance and scalable SQL queries | dev.to | 2025-01-19

    The rise of Big Data projects, real-time self-service analytics, online query services, and social networks, among others, have enabled scenarios for massive and high-performance data queries. In response to this challenge, MPP (massively parallel processing database) technology was created, and it quickly established itself. Among the open-source MPP options, Presto (https://prestodb.io/) is the best-known option. It originated in Facebook and was utilized for data analytics, but later became open-sourced. However, since Teradata has joined the Presto community, it offers support now.

  4. Apache Hadoop

    Apache Hadoop

    Project mention: Commit to Growth: My 2024 Reflection | dev.to | 2025-01-10

    During my time with Tublian, I learned a valuable lesson about focus. Instead of jumping between different repositories, I concentrated on making meaningful contributions to just a few, including Apache and two others. This approach wasn't random - it came from the amazing mentorship I received from the Open Sauced community. Huge shoutout to @Bekah, @Chrissy, @ayu, and @Jeffrey for teaching me that consistency beats quantity any day!

  5. Deeplearning4j

    Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...

    Project mention: Deeplearning4j Suite Overview | news.ycombinator.com | 2024-03-29
  6. doris

    Apache Doris is an easy-to-use, high performance and unified analytics database.

    Project mention: Apache Doris: open-source data warehouse for real time data analytics | news.ycombinator.com | 2024-10-26
  7. Trino

    Official repository of Trino, the distributed SQL query engine for big data, former

    Project mention: Trino: A fast distributed SQL query engine for big data analytics | news.ycombinator.com | 2024-07-09
  8. Alluxio (formerly Tachyon)

    Alluxio, data orchestration for analytics and machine learning in the cloud

  9. Apache Hive

    Apache Hive

    Project mention: Hive: An Open-Source Data Warehouse Built on Apache Hadoop | news.ycombinator.com | 2024-08-13
  10. Apache Ignite

    Apache Ignite (by apache)

    Project mention: API Caching: Techniques for Better Performance | dev.to | 2024-10-17

    Apache Ignite — Free and open-source, Apache Ignite is a horizontally scalable key-value cache store system with a robust multi-model database that powers APIs to compute distributed data. Ignite provides a security system that can authenticate users' credentials on the server. It can also be used for system workload acceleration, real-time data processing, analytics, and as a graph-centric programming model.

  11. Apache Calcite

    Apache Calcite

  12. Apache Nutch

    Apache Nutch is an extensible and scalable web crawler

    Project mention: 11 best open-source web crawlers and scrapers in 2024 | dev.to | 2024-10-29

    Language: Java | GitHub: 2.9K+ stars | link

  13. Apache Drill

    Apache Drill is a distributed MPP query layer for self describing data (by apache)

  14. kylo

    Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

  15. ozone

    Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.

    Project mention: Apache Ozone: Scalable, redundant, distributed object store for Apache Hadoop | news.ycombinator.com | 2024-12-04
  16. venice

    Venice, Derived Data Platform for Planet-Scale Workloads. (by linkedin)

  17. incubator-wayang

    Apache Wayang(incubating) is the first cross-platform data processing system.

    Project mention: Show HN: Apache Wayang supports now Kafka | news.ycombinator.com | 2024-11-04
  18. hadoopcryptoledger

    Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Java Hadoop discussion

Log in or Post with

Java Hadoop related posts

  • Commit to Growth: My 2024 Reflection

    1 project | dev.to | 10 Jan 2025
  • Where is Java Used in Industry?

    1 project | dev.to | 18 Dec 2024
  • Apache Ozone: Scalable, redundant, distributed object store for Apache Hadoop

    1 project | news.ycombinator.com | 4 Dec 2024
  • Apache Doris: open-source data warehouse for real time data analytics

    1 project | news.ycombinator.com | 26 Oct 2024
  • Evolution of Data Sharding Towards Automation and Flexibility

    1 project | dev.to | 26 Aug 2024
  • Hadoop Installation and Deployment Guide

    1 project | dev.to | 21 Aug 2024
  • Steps to industry-leading query speed: evolution of the Apache Doris execution engine

    1 project | dev.to | 13 Aug 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 20 Jan 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Hadoop projects in Java? This list will help you:

# Project Stars
1 APIJSON 17,382
2 Presto 16,158
3 Apache Hadoop 14,889
4 Deeplearning4j 13,745
5 doris 13,052
6 Trino 10,726
7 Alluxio (formerly Tachyon) 6,904
8 Apache Hive 5,613
9 Apache Ignite 4,855
10 Apache Calcite 4,694
11 Apache Nutch 2,961
12 Apache Drill 1,953
13 kylo 1,111
14 ozone 866
15 venice 501
16 incubator-wayang 214
17 hadoopcryptoledger 141

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Java is
the 8th most popular programming language
based on number of references?