Java Data Science

Open-source Java projects categorized as Data Science

Top 10 Java Data Science Projects

  • OpenRefine

    OpenRefine is a free, open source power tool for working with messy data and improving it

    Project mention: OpenRefine | /r/patient_hackernews | 2023-10-23
  • Trino

    Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (

    Project mention: Game analytic power: how we process more than 1 billion events per day | | 2023-11-24

    We decided not to waste time reinventing the wheel and simply installed Trino on our servers. It’s a full featured SQL query engine that works on your data. Now our analysts can use it to work with data from AppMetr and execute queries at different levels of complexity.

  • InfluxDB

    Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.

  • Smile

    Statistical Machine Intelligence & Learning Engine

    Project mention: Need statistic test library for Spark Scala | /r/scala | 2023-05-05

    Check out Smile too.

  • Tablesaw

    Java dataframe and visualization library

    Project mention: Tablesaw: Java Dataframe and Visualization Library | | 2023-02-06
  • DatumBox

    Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

  • hopsworks

    Hopsworks - Data-Intensive AI platform with a Feature Store

    Project mention: Hopworks: MLOps platform with Python-centric Feature Store | | 2022-12-02
  • odd-platform

    First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

    Project mention: OpenDataDiscovery 0.15 with Data Deprecation and Metadata Stale | | 2023-08-04
  • Onboard AI

    Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at

  • zingg

    Scalable identity resolution, entity resolution, data mastering and deduplication using ML

  • rumble

    ⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more (by RumbleDB)

  • Data-Engineering-Roadmap

    Roadmap for Data Engineering

    Project mention: Pitanje za data engineering? | /r/programiranje | 2023-06-30
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-11-24.

Java Data Science related posts


What are some of the best open-source Data Science projects in Java? This list will help you:

Project Stars
1 OpenRefine 10,000
2 Trino 8,864
3 Smile 5,848
4 Tablesaw 3,365
5 DatumBox 1,089
6 hopsworks 1,012
7 odd-platform 999
8 zingg 810
9 rumble 200
10 Data-Engineering-Roadmap 68
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives