SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Data Science Open-Source Projects
-
Project mention: Learn Machine Learning with these GitHub repositories | news.ycombinator.com | 2025-01-15
*Learn Machine Learning with these amazing GitHub repositories! *
1⃣ [ML for Beginners](https://github.com/microsoft/ML-For-Beginners) by Microsoft
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
First, cloned the Apache superset repository.
-
Project mention: Building a Sarcasm Detection System with LSTM and GloVe: A Complete Guide | dev.to | 2025-01-02
Keras API reference
-
Project mention: Must-Know 2025 Developer’s Roadmap and Key Programming Trends | dev.to | 2025-02-05
Python’s Growth in Data Work and AI: Python continues to lead because of its easy-to-read style and the huge number of libraries available for tasks from data work to artificial intelligence. Tools like TensorFlow and PyTorch make it a must-have. Whether you’re experienced or just starting, Python’s clear style makes it a good choice for diving into machine learning. Actionable Tip: If you’re new to Python, try projects that combine data with everyday problems. For example, build a simple recommendation system using Pandas and scikit-learn.
-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Project mention: Must-Know 2025 Developer’s Roadmap and Key Programming Trends | dev.to | 2025-02-05Python’s Growth in Data Work and AI: Python continues to lead because of its easy-to-read style and the huge number of libraries available for tasks from data work to artificial intelligence. Tools like TensorFlow and PyTorch make it a must-have. Whether you’re experienced or just starting, Python’s clear style makes it a good choice for diving into machine learning. Actionable Tip: If you’re new to Python, try projects that combine data with everyday problems. For example, build a simple recommendation system using Pandas and scikit-learn.
-
Project mention: 10 Must-Know Open Source Platform Engineering Tools for AI/ML Workflows | dev.to | 2025-02-06
Apache Airflow offers simplicity when it comes to scheduling, authoring, and monitoring ML workflows using Python. The tool's greatest advantage is its compatibility with any system or process you are running. This also eliminates manual intervention and increases team productivity, which aligns with the principles of Platform Engineering tools.
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Project mention: Building a Sarcasm Detection System with LSTM and GloVe: A Complete Guide | dev.to | 2025-01-02
Streamlit
-
Project mention: Show HN: I made a website to semantically search ArXiv papers | news.ycombinator.com | 2024-12-24
-
Ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
I'm guessing this comment is some kind of "if you know, you know." Likely starting from https://docs.ray.io/en/latest/cluster/vms/user-guides/launch... and then trawling through one of these I guess https://github.com/ray-project/ray/issues?q=is%3Aissue+prem+...
-
Project mention: SpaCy – Industrial-Strength Natural Language Processing in Python | news.ycombinator.com | 2025-02-09
-
-
pytorch-lightning
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
Project mention: SB-1047 will stifle open-source AI and decrease safety | news.ycombinator.com | 2024-04-29It's very easy to get started, right in your Terminal, no fees! No credit card at all.
And there are cloud providers like https://replicate.com/ and https://lightning.ai/ that will let you use your LLM via an API key just like you did with OpenAI if you need that.
You don't need OpenAI - nobody does.
-
Get started with Data Science in the Data Science for Beginners curricula.
-
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
-
Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
-
awesome-datascience
:memo: An awesome Data Science repository to learn and apply for real world problems.
Awesome Data Science - An awesome Data Science repository.
-
d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
-
ML-From-Scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
Project mention: Learn Machine Learning with these GitHub repositories | news.ycombinator.com | 2025-01-153⃣ [ML From Scratch](https://github.com/eriklindernoren/ML-From-Scratch) by Erik Linder-Noren
-
-
500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code
500 AI Machine learning Deep learning Computer vision NLP Projects with code
500 AI machine learning NLP programming projects
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Data Science discussion
Data Science related posts
-
SpaCy – Industrial-Strength Natural Language Processing in Python
-
Using VSCode to track and visualize AI experiments
-
35+ Newly Launched GitHub Projects Every Developer
-
Using VSCode to track and visualize AI experiments
-
10 Must-Know Open Source Platform Engineering Tools for AI/ML Workflows
-
Must-Know 2025 Developer’s Roadmap and Key Programming Trends
-
Colors with Rio's oklab color space
-
A note from our sponsor - SaaSHub
www.saashub.com | 10 Feb 2025
Index
What are some of the best open-source Data Science projects? This list will help you:
# | Project | Stars |
---|---|---|
1 | ML-For-Beginners | 70,968 |
2 | superset | 64,308 |
3 | Keras | 62,474 |
4 | scikit-learn | 61,000 |
5 | Pandas | 44,513 |
6 | Airflow | 38,610 |
7 | Made-With-ML | 38,126 |
8 | streamlit | 37,234 |
9 | gradio | 35,758 |
10 | Ray | 35,238 |
11 | spaCy | 30,849 |
12 | AI-Expert-Roadmap | 29,365 |
13 | pytorch-lightning | 28,921 |
14 | Data-Science-For-Beginners | 28,753 |
15 | data-science-ipython-notebooks | 27,812 |
16 | applied-ml | 27,669 |
17 | Probabilistic-Programming-and-Bayesian-Methods-for-Hackers | 27,171 |
18 | awesome-datascience | 25,681 |
19 | d2l-en | 24,864 |
20 | ML-From-Scratch | 24,156 |
21 | fastbook | 22,464 |
22 | 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code | 21,974 |
23 | dash | 21,908 |