Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev. Learn more →
Top 10 Java Data Science Projects
-
OpenRefine
OpenRefine is a free, open source power tool for working with messy data and improving it
-
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Project mention: Game analytic power: how we process more than 1 billion events per day | dev.to | 2023-11-24We decided not to waste time reinventing the wheel and simply installed Trino on our servers. It’s a full featured SQL query engine that works on your data. Now our analysts can use it to work with data from AppMetr and execute queries at different levels of complexity.
-
InfluxDB
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
-
Check out Smile too.
-
Project mention: Tablesaw: Java Dataframe and Visualization Library | news.ycombinator.com | 2023-02-06
-
DatumBox
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
-
Project mention: Hopworks: MLOps platform with Python-centric Feature Store | news.ycombinator.com | 2022-12-02
-
odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Project mention: OpenDataDiscovery 0.15 with Data Deprecation and Metadata Stale | news.ycombinator.com | 2023-08-04 -
Onboard AI
Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.
-
-
rumble
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more (by RumbleDB)
-
Java Data Science related posts
- [OC] Gender diversity in Tech companies
- What’s your process for deploying a data pipeline from a notebook, running it, and managing it in production?
- I'm just going to say it - I prefer Spyder
- Airbyte and Meltano comparison
- Launch HN: Castled Data (YC W22) – Open-Source Reverse ETL
- Castled - an open source reverse ETL solution that helps you to periodically sync the data in your db/warehouse into sales, marketing, support or custom apps without any help from engineering teams
- Castled - an open source reverse ETL solution that helps you to periodically sync the data in your db/warehouse into sales, marketing, support or custom apps without any help from engineering teams
-
A note from our sponsor - Onboard AI
getonboard.dev | 1 Dec 2023
Index
What are some of the best open-source Data Science projects in Java? This list will help you:
Project | Stars | |
---|---|---|
1 | OpenRefine | 10,000 |
2 | Trino | 8,864 |
3 | Smile | 5,848 |
4 | Tablesaw | 3,365 |
5 | DatumBox | 1,089 |
6 | hopsworks | 1,012 |
7 | odd-platform | 999 |
8 | zingg | 810 |
9 | rumble | 200 |
10 | Data-Engineering-Roadmap | 68 |