Java Hadoop

Open-source Java projects categorized as Hadoop Edit details
Related topics: #Java #Big Data #SQL #Database #Hive

Top 12 Java Hadoop Projects

  • Presto

    The official home of the Presto distributed SQL query engine for big data

    Project mention: What are y'all learning right now? | reddit.com/r/developersIndia | 2022-06-13

    more specifically, recently started learning about Presto [paper], and have been diving deeper into [source] code.

  • Apache Hadoop

    Apache Hadoop

    Project mention: How-to-Guide: Contributing to Open Source | reddit.com/r/dataengineering | 2022-06-11

    Apache Hadoop

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • Deeplearning4j

    Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.

    Project mention: Data Science Competition | dev.to | 2022-03-25

    DL4J

  • Alluxio (formerly Tachyon)

    Alluxio, data orchestration for analytics and machine learning in the cloud

  • Trino

    Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

    Project mention: How-to-Guide: Contributing to Open Source | reddit.com/r/dataengineering | 2022-06-11

    Although Trino (formerly Presto) is in the awesome for beginners list, it’s also a really good DE project as it is a distributed query engine that connects to most of the projects listed above. So depending on where you work in this project you can gain a depth of knowledge on the query engine or breadth across all the connectors …or go hybrid .

  • Apache Hive

    Apache Hive

    Project mention: Visionary French entrepreneur, David Gurle, launches new venture – Hive | news.ycombinator.com | 2022-06-15
  • Apache Ignite

    Apache Ignite (by apache)

    Project mention: Ask HN: P2P Databases? | news.ycombinator.com | 2022-03-01

    Ignite works as you describe:

    https://ignite.apache.org/

    I wouldn't really recommend this approach, I would think more in terms of subscriptions and topics and less of a 'database'.

  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • Apache Calcite

    Apache Calcite

    Project mention: CITIC Industrial Cloud — Apache ShardingSphere Enterprise Applications | dev.to | 2022-04-14

    The SQL Federation engine contains processes such as SQL Parser, SQL Binder, SQL Optimizer, Data Fetcher and Operator Calculator, suitable for dealing with co-related queries and subqueries cross multiple database instances. At the underlying layer, it uses Calcite to implement RBO (Rule Based Optimizer) and CBO (Cost Based Optimizer) based on relational algebra, and query the results through the optimal execution plan.

  • Apache Nutch

    Apache Nutch is an extensible and scalable web crawler

  • Apache Drill

    Apache Drill is a distributed MPP query layer for self describing data

    Project mention: DeWitt Clause, or Can You Benchmark %DATABASE% and Get Away With It | dev.to | 2022-06-02

    Apache Drill, Druid, Flink, Hive, Kafka, Spark

  • ozone

    Scalable, redundant, and distributed object store for Apache Hadoop

  • trino-storage

    Storage connector for Trino

    Project mention: Query a Rest API via SQL? | reddit.com/r/dataengineering | 2021-10-04

    Presto also has a similar capability with the presto-flex connector, but again…not intended for production use.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-06-15.

Java Hadoop related posts

Index

What are some of the best open-source Hadoop projects in Java? This list will help you:

Project Stars
1 Presto 13,608
2 Apache Hadoop 12,719
3 Deeplearning4j 12,509
4 Alluxio (formerly Tachyon) 5,700
5 Trino 5,622
6 Apache Hive 4,346
7 Apache Ignite 4,185
8 Apache Calcite 3,170
9 Apache Nutch 2,390
10 Apache Drill 1,688
11 ozone 518
12 trino-storage 1
Find remote jobs at our new job board 99remotejobs.com. There are 3 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Developer Ecosystem Survey 2022
Take part in the Developer Ecosystem Survey 2022 by JetBrains and get a chance to win a Macbook, a Nvidia graphics card, or other prizes. We’ll create an infographic full of stats, and you’ll get personalized results so you can compare yourself with other developers.
surveys.jetbrains.com