Data Science

Top 23 Data Science Open-Source Projects

Data Science
  1. ML-For-Beginners

    12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

    Project mention: Learn Machine Learning with these GitHub repositories | news.ycombinator.com | 2025-01-15

    *Learn Machine Learning with these amazing GitHub repositories! *

    1⃣ [ML for Beginners](https://github.com/microsoft/ML-For-Beginners) by Microsoft

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. superset

    Apache Superset is a Data Visualization and Data Exploration Platform

    Project mention: Installing Apache Superset - Windows 11 | dev.to | 2025-02-01

    First, cloned the Apache superset repository.

  4. Keras

    Deep Learning for humans

    Project mention: Building a Sarcasm Detection System with LSTM and GloVe: A Complete Guide | dev.to | 2025-01-02

    Keras API reference

  5. scikit-learn

    scikit-learn: machine learning in Python

    Project mention: Must-Know 2025 Developer’s Roadmap and Key Programming Trends | dev.to | 2025-02-05

    Python’s Growth in Data Work and AI: Python continues to lead because of its easy-to-read style and the huge number of libraries available for tasks from data work to artificial intelligence. Tools like TensorFlow and PyTorch make it a must-have. Whether you’re experienced or just starting, Python’s clear style makes it a good choice for diving into machine learning. Actionable Tip: If you’re new to Python, try projects that combine data with everyday problems. For example, build a simple recommendation system using Pandas and scikit-learn.

  6. Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

    Project mention: Must-Know 2025 Developer’s Roadmap and Key Programming Trends | dev.to | 2025-02-05

    Python’s Growth in Data Work and AI: Python continues to lead because of its easy-to-read style and the huge number of libraries available for tasks from data work to artificial intelligence. Tools like TensorFlow and PyTorch make it a must-have. Whether you’re experienced or just starting, Python’s clear style makes it a good choice for diving into machine learning. Actionable Tip: If you’re new to Python, try projects that combine data with everyday problems. For example, build a simple recommendation system using Pandas and scikit-learn.

  7. Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    Project mention: 10 Must-Know Open Source Platform Engineering Tools for AI/ML Workflows | dev.to | 2025-02-06

    Apache Airflow offers simplicity when it comes to scheduling, authoring, and monitoring ML workflows using Python. The tool's greatest advantage is its compatibility with any system or process you are running. This also eliminates manual intervention and increases team productivity, which aligns with the principles of Platform Engineering tools.

  8. Made-With-ML

    Learn how to design, develop, deploy and iterate on production-grade ML applications.

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. streamlit

    Streamlit — A faster way to build and share data apps.

    Project mention: Building a Sarcasm Detection System with LSTM and GloVe: A Complete Guide | dev.to | 2025-01-02

    Streamlit

  11. gradio

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

    Project mention: Show HN: I made a website to semantically search ArXiv papers | news.ycombinator.com | 2024-12-24
  12. Ray

    Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

    Project mention: Ask HN: What Open Source Projects Need Help? | news.ycombinator.com | 2024-11-16

    I'm guessing this comment is some kind of "if you know, you know." Likely starting from https://docs.ray.io/en/latest/cluster/vms/user-guides/launch... and then trawling through one of these I guess https://github.com/ray-project/ray/issues?q=is%3Aissue+prem+...

  13. spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: SpaCy – Industrial-Strength Natural Language Processing in Python | news.ycombinator.com | 2025-02-09
  14. AI-Expert-Roadmap

    Roadmap to becoming an Artificial Intelligence Expert in 2022

  15. pytorch-lightning

    Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.

    Project mention: SB-1047 will stifle open-source AI and decrease safety | news.ycombinator.com | 2024-04-29

    It's very easy to get started, right in your Terminal, no fees! No credit card at all.

    And there are cloud providers like https://replicate.com/ and https://lightning.ai/ that will let you use your LLM via an API key just like you did with OpenAI if you need that.

    You don't need OpenAI - nobody does.

  16. Data-Science-For-Beginners

    10 Weeks, 20 Lessons, Data Science for All!

    Project mention: Welcome to 14 days of Data Science! | dev.to | 2024-03-07

    Get started with Data Science in the Data Science for Beginners curricula.

  17. data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  18. applied-ml

    📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

  19. Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

    aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

  20. awesome-datascience

    :memo: An awesome Data Science repository to learn and apply for real world problems.

    Project mention: Awesome List | dev.to | 2024-06-08

    Awesome Data Science - An awesome Data Science repository.

  21. d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

  22. ML-From-Scratch

    Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

    Project mention: Learn Machine Learning with these GitHub repositories | news.ycombinator.com | 2025-01-15

    3⃣ [ML From Scratch](https://github.com/eriklindernoren/ML-From-Scratch) by Erik Linder-Noren

  23. fastbook

    The fastai book, published as Jupyter Notebooks

  24. 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code

    500 AI Machine learning Deep learning Computer vision NLP Projects with code

    Project mention: Top Github repositories for 10+ programming languages | dev.to | 2024-07-16

    500 AI machine learning NLP programming projects

  25. dash

    Data Apps & Dashboards for Python. No JavaScript Required.

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Data Science discussion

Log in or Post with

Data Science related posts

  • SpaCy – Industrial-Strength Natural Language Processing in Python

    1 project | news.ycombinator.com | 9 Feb 2025
  • Using VSCode to track and visualize AI experiments

    1 project | news.ycombinator.com | 9 Feb 2025
  • 35+ Newly Launched GitHub Projects Every Developer

    1 project | dev.to | 8 Feb 2025
  • Using VSCode to track and visualize AI experiments

    1 project | news.ycombinator.com | 8 Feb 2025
  • 10 Must-Know Open Source Platform Engineering Tools for AI/ML Workflows

    6 projects | dev.to | 6 Feb 2025
  • Must-Know 2025 Developer’s Roadmap and Key Programming Trends

    6 projects | dev.to | 5 Feb 2025
  • Colors with Rio's oklab color space

    2 projects | dev.to | 2 Feb 2025
  • A note from our sponsor - SaaSHub
    www.saashub.com | 10 Feb 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Data Science projects? This list will help you:

# Project Stars
1 ML-For-Beginners 70,968
2 superset 64,308
3 Keras 62,474
4 scikit-learn 61,000
5 Pandas 44,513
6 Airflow 38,610
7 Made-With-ML 38,126
8 streamlit 37,234
9 gradio 35,758
10 Ray 35,238
11 spaCy 30,849
12 AI-Expert-Roadmap 29,365
13 pytorch-lightning 28,921
14 Data-Science-For-Beginners 28,753
15 data-science-ipython-notebooks 27,812
16 applied-ml 27,669
17 Probabilistic-Programming-and-Bayesian-Methods-for-Hackers 27,171
18 awesome-datascience 25,681
19 d2l-en 24,864
20 ML-From-Scratch 24,156
21 fastbook 22,464
22 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code 21,974
23 dash 21,908

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Python is
the 2nd most popular programming language
based on number of references?