Open-source projects categorized as MapReduce | Edit details
Language filter: + Python + Java + Scala + C# + Go

Top 10 MapReduce Open-Source Projects

  • GitHub repo Apache Spark

    Apache Spark - A unified analytics engine for large-scale data processing

    Project mention: 5 Best Big Data Frameworks You Can Learn in 2021 | dev.to | 2021-06-18

    Both Fortune 500 and small companies are looking for competent people who can derive useful insight from their huge pile of data and that's where Big Data Framework like Apache Hadoop, Apache Spark, Flink, Storm, and Hive can help.

  • GitHub repo data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

    Project mention: Beginner in Python for Data Science | reddit.com/r/learnpython | 2020-12-27

    data science ipython notebooks

  • GitHub repo Redisson

    Redisson - Redis Java client with features of In-Memory Data Grid. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, MyBatis, RPC, local cache ...

  • GitHub repo PowerJob

    Enterprise job scheduling middleware with distributed computing ability.

    Project mention: PowerJob V3.4.3 has been released. Check to see the work. Suggestions are welcomed. | reddit.com/r/java | 2021-01-17

    Oh yes! You can see the registered users in Known users. They are companies in China as we didn't promote to foreign friends. Cisco, Jd.com, OPPO are all big companies there in China.

  • GitHub repo dpark

    Python clone of Spark, a MapReduce alike framework in Python

  • GitHub repo mrjob

    Run MapReduce jobs on Hadoop or Amazon Web Services

  • GitHub repo dumbo

    Python module that allows one to easily write and run Hadoop programs.

  • GitHub repo Mobius: C# API for Spark

    C# and F# language binding and extensions to Apache Spark (by microsoft)

  • GitHub repo tdigest

    t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark (by CamDavidsonPilon)

  • GitHub repo goterator

    Lazy iterator implementation for Golang

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-06-18.


What are some of the best open-source MapReduce projects? This list will help you:

Project Stars
1 Apache Spark 30,129
2 data-science-ipython-notebooks 21,217
3 Redisson 16,734
4 PowerJob 2,836
5 dpark 2,663
6 mrjob 2,546
7 dumbo 1,044
8 Mobius: C# API for Spark 930
9 tdigest 286
10 goterator 3