Top 8 Java Data Science Projects
-
OpenRefine
OpenRefine is a free, open source power tool for working with messy data and improving it
Project mention: Cannot create table from CSV file in BigQuery. | reddit.com/r/learnSQL | 2022-06-05I'm not familiar with BigQuery but could it be inconsistencies in the data maybe? So I mean missing commas or quotes or incorrect datetime formats or something like that. You can use the CSV Lint plug-in in Notepad++ or install OpenRefine so check for those type for errors.
-
airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Project mention: Ask HN: How are you dealing with the M1/ARM migration? | news.ycombinator.com | 2022-06-10 -
JetBrains
Developer Ecosystem Survey 2022. Take part in the Developer Ecosystem Survey 2022 by JetBrains and get a chance to win a Macbook, a Nvidia graphics card, or other prizes. We’ll create an infographic full of stats, and you’ll get personalized results so you can compare yourself with other developers.
-
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Project mention: How-to-Guide: Contributing to Open Source | reddit.com/r/dataengineering | 2022-06-11Although Trino (formerly Presto) is in the awesome for beginners list, it’s also a really good DE project as it is a distributed query engine that connects to most of the projects listed above. So depending on where you work in this project you can gain a depth of knowledge on the query engine or breadth across all the connectors …or go hybrid .
-
Project mention: What libraries do you use for machine learning and data visualizing in scala? | reddit.com/r/scala | 2021-11-27
I use smile https://github.com/haifengl/smile with ammonite and it feels pretty easy/good to work with. Of course for pure looking at data, and exploration, you're not going to beat python.
-
-
DatumBox
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
-
Project mention: is it possible to "fuzzy match" or dedupe columns in Redshift? | reddit.com/r/aws | 2022-04-30
If you are open to using a framework for this, check Zingg at https://github.com/zinggAI/zingg. It connects to Redshift, snowflake and other warehouses and can handle multiple columns
-
SonarQube
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
-
rumble
⛈️ RumbleDB 1.19.0 "Tipuana Tipu" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more (by RumbleDB)
Project mention: RumbleDB: Query with ease a lot of different nested, heterogeneous data formats | news.ycombinator.com | 2021-12-01
Java Data Science related posts
- I'm just going to say it - I prefer Spyder
- Airbyte and Meltano comparison
- Launch HN: Castled Data (YC W22) – Open-Source Reverse ETL
- Castled - an open source reverse ETL solution that helps you to periodically sync the data in your db/warehouse into sales, marketing, support or custom apps without any help from engineering teams
- Castled - an open source reverse ETL solution that helps you to periodically sync the data in your db/warehouse into sales, marketing, support or custom apps without any help from engineering teams
- Castled - an open source reverse ETL solution that helps you to periodically sync the data in your db/warehouse into sales, marketing, support or custom apps without any help from engineering teams
- Launch HN: Castled Data (YC W22) – Open-Source Reverse ETL
Index
What are some of the best open-source Data Science projects in Java? This list will help you:
Project | Stars | |
---|---|---|
1 | OpenRefine | 8,870 |
2 | airbyte | 7,123 |
3 | Trino | 5,622 |
4 | Smile | 5,531 |
5 | Tablesaw | 2,932 |
6 | DatumBox | 1,077 |
7 | zingg | 537 |
8 | rumble | 173 |
Are you hiring? Post a new remote job listing for free.