Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →
Top 23 Hadoop Open-Source Projects
-
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
luigi
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
-
APIJSON
🏆 实时 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构 🏆 Real-Time coding-free, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and the returned JSON of API can be customized by Frontend(Client) users
Project mention: Top 15 Open-Source Low-Code Projects with the Most GitHub Stars | dev.to | 2024-07-18GitHub https://github.com/Tencent/APIJSON GitHub Stars 16.9k Most Recent Update on GitHub 2 days ago Open Source License Apache 2.0 Number of Active Contributors This Year 6 Acceptance of External PRs Yes Official Website http://apijson.cn/ Documentation https://apijsondocs.readthedocs.io/en/latest/
-
Project mention: Using IRIS and Presto for high-performance and scalable SQL queries | dev.to | 2025-01-19
The rise of Big Data projects, real-time self-service analytics, online query services, and social networks, among others, have enabled scenarios for massive and high-performance data queries. In response to this challenge, MPP (massively parallel processing database) technology was created, and it quickly established itself. Among the open-source MPP options, Presto (https://prestodb.io/) is the best-known option. It originated in Facebook and was utilized for data analytics, but later became open-sourced. However, since Teradata has joined the Presto community, it offers support now.
-
Project mention: Unveiling the Apache License 2.0: A Deep Dive into Open Source Freedom | dev.to | 2025-03-11
One of the key attributes of Apache License 2.0 is its flexible nature. Permitting use in both proprietary and open source environments, it has become the go-to choice for innovative projects ranging from the Apache HTTP Server to large-scale initiatives like Apache Spark and Hadoop. This flexibility is not solely legal; it is also philosophical. The license is designed to encourage transparency and maintain a healthy balance between freedom and accountability, ultimately making it easier for developers to adapt and contribute without restrictive legal barriers. Another modern twist discussed in the article is the concept of dual licensing. Dual licensing can offer an attractive method for additional commercial exploitation while still upholding open source principles. However, as the article cautions, dual licensing involves legal intricacy and demands rigor in managing Contributor License Agreements (CLAs), a challenge that the open source community navigates with ongoing debates. For developers looking to understand similar innovative approaches to licensing, further information can be explored at License Token.
-
Deeplearning4j
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...
-
Project mention: Apache Doris: open-source data warehouse for real time data analytics | news.ycombinator.com | 2024-10-26
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
school-of-sre
At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
-
H2O
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
-
Alluxio (formerly Tachyon)
Alluxio, data orchestration for analytics and machine learning in the cloud
-
Project mention: Hive: An Open-Source Data Warehouse Built on Apache Hadoop | news.ycombinator.com | 2024-08-13
-
Apache Ignite — Free and open-source, Apache Ignite is a horizontally scalable key-value cache store system with a robust multi-model database that powers APIs to compute distributed data. Ignite provides a security system that can authenticate users' credentials on the server. It can also be used for system workload acceleration, real-time data processing, analytics, and as a graph-centric programming model.
-
-
Language: Java | GitHub: 2.9K+ stars | link
-
-
kyuubi
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
-
-
-
nagios-plugins
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
-
kylo
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
-
ozone
Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.
Project mention: Apache Ozone: Scalable, redundant, distributed object store for Apache Hadoop | news.ycombinator.com | 2024-12-04 -
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Hadoop discussion
Hadoop related posts
-
Unveiling the Apache License 2.0: A Deep Dive into Open Source Freedom
-
Apache Hadoop: Pioneering Open Source Innovation in Big Data
-
Commit to Growth: My 2024 Reflection
-
Where is Java Used in Industry?
-
How to Install PySpark on Your Local Machine
-
Apache Ozone: Scalable, redundant, distributed object store for Apache Hadoop
-
Apache Doris: open-source data warehouse for real time data analytics
-
A note from our sponsor - CodeRabbit
coderabbit.ai | 19 Mar 2025
Index
What are some of the best open-source Hadoop projects? This list will help you:
# | Project | Stars |
---|---|---|
1 | data-science-ipython-notebooks | 27,993 |
2 | luigi | 18,154 |
3 | APIJSON | 17,514 |
4 | Presto | 16,247 |
5 | Apache Hadoop | 14,975 |
6 | Deeplearning4j | 13,861 |
7 | doris | 13,322 |
8 | Trino | 10,996 |
9 | school-of-sre | 7,922 |
10 | H2O | 7,066 |
11 | Alluxio (formerly Tachyon) | 6,953 |
12 | Apache Hive | 5,652 |
13 | Apache Ignite | 4,893 |
14 | Apache Calcite | 4,759 |
15 | Apache Nutch | 2,987 |
16 | docker-hadoop | 2,229 |
17 | kyuubi | 2,159 |
18 | winutils | 2,007 |
19 | Apache Drill | 1,960 |
20 | nagios-plugins | 1,140 |
21 | kylo | 1,111 |
22 | ozone | 892 |
23 | WeDataSphere | 662 |