SaaSHub helps you find the best software and product alternatives Learn more β
Top 23 Python Data Science Projects
-
The unchallenged leader in AI development is still Python. and Keras, and robust community support.
-
InfluxDB
InfluxDB β Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
7. Scikit-learn - Machine Learning
-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Libraries for data science and deep learning that are always changing
-
Apache Airflow
-
The only thing left to do then was to build something that could showcase the power of code ingestion within a vector database, and it immediately clicked in my mind: "Why don't I ingest my entire codebase of solved Go exercises from Exercism?" That's how I created Code-RAGent, your friendly coding assistant based on your personal codebases and grounded in web search. It is built on top of GPT-4.1, powered by OpenAI, LinkUp, LlamaIndex, Qdrant, FastAPI and Streamlit. The building of this project was aimed at providing a reproducible and adaptable agent, that people can therefore customize based on their needs, and it was composed of three phases:
-
gradio
Build and share delightful machine learning apps, all in Python. π Star to support our work!
Project mention: How To Run OpenAI Agents SDK Locally With 100+ LLMs and Custom Tracing | dev.to | 2025-05-07Streamlit and Gradio: Interact with OpenAI agents via an AI chat UI.
-
Ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
I'm guessing this comment is some kind of "if you know, you know." Likely starting from https://docs.ray.io/en/latest/cluster/vms/user-guides/launch... and then trawling through one of these I guess https://github.com/ray-project/ray/issues?q=is%3Aissue+prem+...
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
pytorch-lightning
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
-
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
-
ML-From-Scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
Project mention: Learn Machine Learning with these GitHub repositories | news.ycombinator.com | 2025-01-153β£ [ML From Scratch](https://github.com/eriklindernoren/ML-From-Scratch) by Erik Linder-Noren
-
-
Project mention: How to Get Started with Scikit-Learn: A Beginner-Friendly Guide to Machine Learning in Python | dev.to | 2025-04-24
As is the case with most Python libraries, it is open-source and free-to-use, making it easily accessible by anyone willing to learn machine learning, and it is built upon other open-source libraries within Python, like SciPy for advanced scientific operations, NumPy for efficient numerical computations, Matplotlib for data visualization, and Cython for increased efficiency and speed, similar to that of C/C++.
-
-
pandas-ai
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
View the Project on GitHub
-
Project mention: A ranked list of machine learning Python libraries. Updated weekly | news.ycombinator.com | 2025-01-31
-
Project mention: Show HN: Flow β A Dynamic Task Engine for AI Agents Without DAG | news.ycombinator.com | 2024-12-02
- https://github.com/PrefectHQ/prefect
-
ipython
Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.
Project mention: REPL for Dart: supporting 3rd party packages, hot reload, and full grammar | news.ycombinator.com | 2024-09-28 -
-
Project mention: Ask HN: What is the simplest data orchestration tool you've worked with? | news.ycombinator.com | 2025-03-21
-
marimo
A reactive notebook for Python β run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. All in a modern, AI-native editor.
there is https://marimo.io/ that does all this and more
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Data Science discussion
Python Data Science related posts
-
Show HN: Juvio β UV Kernel for Jupyter
-
spaCy - NLP in Python
-
TabPFN: Foundation Model for Tabular Data
-
PandaAI: Talk to Your Data, Not to Your Code!
-
A Survey of AI Agent Protocols
-
Top Programming Languages for AI Development in 2025
-
How I Hacked Uberβs Hidden API to Download 4379 Rides
-
A note from our sponsor - SaaSHub
www.saashub.com | 22 May 2025
Index
What are some of the best open-source Data Science projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | Keras | 62,989 |
2 | scikit-learn | 62,056 |
3 | Pandas | 45,442 |
4 | Airflow | 40,060 |
5 | streamlit | 39,392 |
6 | gradio | 38,168 |
7 | Ray | 37,068 |
8 | spaCy | 31,576 |
9 | pytorch-lightning | 29,505 |
10 | data-science-ipython-notebooks | 27,993 |
11 | d2l-en | 25,832 |
12 | ML-From-Scratch | 24,416 |
13 | dash | 22,452 |
14 | matplotlib | 21,220 |
15 | recommenders | 20,214 |
16 | pandas-ai | 20,204 |
17 | best-of-ml-python | 20,054 |
18 | Prefect | 19,300 |
19 | ipython | 16,474 |
20 | gensim | 16,027 |
21 | dvc | 14,478 |
22 | marimo | 13,264 |
23 | dagster | 13,154 |