Weka VS Apache Hadoop

Compare Weka vs Apache Hadoop and see what are their differences.

Our great sponsors
  • Scout APM - Less time debugging, more time building
  • SonarQube - Static code analysis for 29 languages.
  • OPS - Build and Run Open Source Unikernels
Weka Apache Hadoop
0 10
302 12,257
- 1.2%
0.0 9.8
almost 3 years ago 4 days ago
PostScript Java
- Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.


Posts with mentions or reviews of Weka. We have used some of these posts to build our list of alternatives and similar projects.

We haven't tracked posts mentioning Weka yet.
Tracking mentions began in Dec 2020.

Apache Hadoop

Posts with mentions or reviews of Apache Hadoop. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-12-22.
  • How do I make multiple computers run as one?
    1 project | reddit.com/r/techsupport | 6 Jan 2022
    The computers that you have appear to use an x86 architecture. Therefore, you could most likely install a Linux distro on each one. Then, you could use something like Apache Hadoop to execute some sort of distributed process across each computer.
  • Spark for beginners - and you
    3 projects | dev.to | 22 Dec 2021
    Hadoop is an ecosystem of tools for big data storage and data analysis. It is older than Spark and writes intermediate results to disk whereas Spark tires to keep data in memory whenever possible, so this is faster in many use cases.
  • Dreaming and Breaking Molds – Establishing Best Practices with Scott Haines
    3 projects | dev.to | 8 Dec 2021
    So Yahoo bought that. I think it was 2013 or 2014. Timelines are hard. But I wanted to go join the Games team and start things back up. But that was also my first kind of experience in actually building recommendation engines or working with lots of data. And I think for me, like that was, I guess...at the time, we were using something called Apache Storm. We had Hadoop, which had been around for a while. And it was like one of the biggest user groups was out of the Yahoo campus. It was called the HUG group, like the Hadoop Users Group. So they met for basically pizza and stories on Wednesdays once a month, which was really fun.
  • Setting up a single-node Hadoop cluster
    1 project | dev.to | 14 Nov 2021
    Hadoop: http://hadoop.apache.org/
  • Spark is lit once again
    6 projects | dev.to | 29 Oct 2021
    Here at Exacaster Spark applications have been used extensively for years. We started using them on our Hadoop clusters with YARN as an application manager. However, with our recent product, we started moving towards a Cloud-based solution and decided to use Kubernetes for our infrastructure needs.
  • The Data Engineer Roadmap 🗺
    11 projects | dev.to | 19 Oct 2021
    Apache Hadoop and HDFS
  • Whatever java can do there is a better alternative in job market?
    2 projects | reddit.com/r/learnjava | 16 Jul 2021
  • 5 Best Big Data Frameworks You Can Learn in 2021
    3 projects | dev.to | 18 Jun 2021
    Both Fortune 500 and small companies are looking for competent people who can derive useful insight from their huge pile of data and that's where Big Data Framework like Apache Hadoop, Apache Spark, Flink, Storm, and Hive can help.
  • The Data Engineering Interview Study Guide
    1 project | dev.to | 22 Apr 2021
    Some positions require Hadoop, others SQL. Some roles require understanding statistics, while still others require heavy amounts of system design.
  • Currently in Data Science. Should I make the move?
    1 project | reddit.com/r/dataengineering | 22 Mar 2021
    It'd be best to clarify exactly what we mean by "Hadoop", but if we define it as the suite described here then the only components I still see being used for greenfield are HDFS - or, to be more specific, HDFS-compatible filesystems (AWS EMR and Azure Data Lake Storage both offer HDFS compatibility) - and maybe (Spark) YARN.

What are some alternatives?

When comparing Weka and Apache Hadoop you can also consider the following projects:

Go IPFS - IPFS implementation in Go

Deeplearning4j - Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.

Ceph - Ceph is a distributed object, block, and file storage platform

Seaweed File System - SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

MooseFS - MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing

GlusterFS - Web Content for gluster.org -- Deprecated as of September 2017

Smile - Statistical Machine Intelligence & Learning Engine

lizardfs - LizardFS is an Open Source Distributed File System licensed under GPLv3.

H2O - Sparkling Water provides H2O functionality inside Spark cluster

GlusterFS - Gluster Filesystem : Build your distributed storage in minutes

JSAT - Java Statistical Analysis Tool, a Java library for Machine Learning