Python Data Science

Open-source Python projects categorized as Data Science

Top 23 Python Data Science Projects

Data Science
  1. Keras

    Deep Learning for humans

    Project mention: Top Programming Languages for AI Development in 2025 | dev.to | 2025-04-29

    The unchallenged leader in AI development is still Python. and Keras, and robust community support.

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. scikit-learn

    scikit-learn: machine learning in Python

    Project mention: 10 Useful Tools and Libraries for Python Developers | dev.to | 2025-03-29

    7. Scikit-learn - Machine Learning

  4. Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

    Project mention: Top Programming Languages for AI Development in 2025 | dev.to | 2025-04-29

    Libraries for data science and deep learning that are always changing

  5. Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    Project mention: Top 10 Open-source AI/ML platform engineering tools | dev.to | 2025-05-19

    Apache Airflow

  6. streamlit

    Streamlit β€” A faster way to build and share data apps.

    Project mention: Build Code-RAGent, an agent for your codebase | dev.to | 2025-04-29

    The only thing left to do then was to build something that could showcase the power of code ingestion within a vector database, and it immediately clicked in my mind: "Why don't I ingest my entire codebase of solved Go exercises from Exercism?" That's how I created Code-RAGent, your friendly coding assistant based on your personal codebases and grounded in web search. It is built on top of GPT-4.1, powered by OpenAI, LinkUp, LlamaIndex, Qdrant, FastAPI and Streamlit. The building of this project was aimed at providing a reproducible and adaptable agent, that people can therefore customize based on their needs, and it was composed of three phases:

  7. gradio

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

    Project mention: How To Run OpenAI Agents SDK Locally With 100+ LLMs and Custom Tracing | dev.to | 2025-05-07

    Streamlit and Gradio: Interact with OpenAI agents via an AI chat UI.

  8. Ray

    Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

    Project mention: Ask HN: What Open Source Projects Need Help? | news.ycombinator.com | 2024-11-16

    I'm guessing this comment is some kind of "if you know, you know." Likely starting from https://docs.ray.io/en/latest/cluster/vms/user-guides/launch... and then trawling through one of these I guess https://github.com/ray-project/ray/issues?q=is%3Aissue+prem+...

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. spaCy

    πŸ’« Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: spaCy - NLP in Python | dev.to | 2025-05-20
  11. pytorch-lightning

    Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.

  12. data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  13. d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

  14. ML-From-Scratch

    Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

    Project mention: Learn Machine Learning with these GitHub repositories | news.ycombinator.com | 2025-01-15

    3⃣ [ML From Scratch](https://github.com/eriklindernoren/ML-From-Scratch) by Erik Linder-Noren

  15. dash

    Data Apps & Dashboards for Python. No JavaScript Required.

  16. matplotlib

    matplotlib: plotting with Python

    Project mention: How to Get Started with Scikit-Learn: A Beginner-Friendly Guide to Machine Learning in Python | dev.to | 2025-04-24

    As is the case with most Python libraries, it is open-source and free-to-use, making it easily accessible by anyone willing to learn machine learning, and it is built upon other open-source libraries within Python, like SciPy for advanced scientific operations, NumPy for efficient numerical computations, Matplotlib for data visualization, and Cython for increased efficiency and speed, similar to that of C/C++.

  17. recommenders

    Best Practices on Recommendation Systems

    Project mention: Best Practices on Recommendation Systems | news.ycombinator.com | 2024-10-19
  18. pandas-ai

    Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

    Project mention: PandaAI: Talk to Your Data, Not to Your Code! | dev.to | 2025-05-06

    View the Project on GitHub

  19. best-of-ml-python

    πŸ† A ranked list of awesome machine learning Python libraries. Updated weekly.

    Project mention: A ranked list of machine learning Python libraries. Updated weekly | news.ycombinator.com | 2025-01-31
  20. Prefect

    The easiest way to build, run, and monitor data pipelines at scale.

    Project mention: Show HN: Flow – A Dynamic Task Engine for AI Agents Without DAG | news.ycombinator.com | 2024-12-02

    - https://github.com/PrefectHQ/prefect

  21. ipython

    Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.

    Project mention: REPL for Dart: supporting 3rd party packages, hot reload, and full grammar | news.ycombinator.com | 2024-09-28
  22. gensim

    Topic Modelling for Humans

  23. dvc

    πŸ¦‰ Data Versioning and ML Experiments

    Project mention: Ask HN: What is the simplest data orchestration tool you've worked with? | news.ycombinator.com | 2025-03-21
  24. marimo

    A reactive notebook for Python β€” run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. All in a modern, AI-native editor.

    Project mention: Show HN: Juvio – UV Kernel for Jupyter | news.ycombinator.com | 2025-05-20

    there is https://marimo.io/ that does all this and more

  25. dagster

    An orchestration platform for the development, production, and observation of data assets.

    Project mention: Personal Picks: Data Product News (March 19, 2025) | dev.to | 2025-03-22
  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Data Science discussion

Log in or Post with

Python Data Science related posts

  • Show HN: Juvio – UV Kernel for Jupyter

    3 projects | news.ycombinator.com | 20 May 2025
  • spaCy - NLP in Python

    1 project | dev.to | 20 May 2025
  • TabPFN: Foundation Model for Tabular Data

    1 project | news.ycombinator.com | 16 May 2025
  • PandaAI: Talk to Your Data, Not to Your Code!

    1 project | dev.to | 6 May 2025
  • A Survey of AI Agent Protocols

    5 projects | news.ycombinator.com | 3 May 2025
  • Top Programming Languages for AI Development in 2025

    9 projects | dev.to | 29 Apr 2025
  • How I Hacked Uber’s Hidden API to Download 4379 Rides

    4 projects | dev.to | 9 Apr 2025
  • A note from our sponsor - SaaSHub
    www.saashub.com | 22 May 2025
    SaaSHub helps you find the best software and product alternatives Learn more β†’

Index

What are some of the best open-source Data Science projects in Python? This list will help you:

# Project Stars
1 Keras 62,989
2 scikit-learn 62,056
3 Pandas 45,442
4 Airflow 40,060
5 streamlit 39,392
6 gradio 38,168
7 Ray 37,068
8 spaCy 31,576
9 pytorch-lightning 29,505
10 data-science-ipython-notebooks 27,993
11 d2l-en 25,832
12 ML-From-Scratch 24,416
13 dash 22,452
14 matplotlib 21,220
15 recommenders 20,214
16 pandas-ai 20,204
17 best-of-ml-python 20,054
18 Prefect 19,300
19 ipython 16,474
20 gensim 16,027
21 dvc 14,478
22 marimo 13,264
23 dagster 13,154

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com