Data Science

Top 23 Data Science Open-Source Projects

  • ML-For-Beginners

    12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

  • Project mention: Good coding groups for black women? | news.ycombinator.com | 2024-01-13

    - https://github.com/microsoft/ML-For-Beginners

    Also check out this list Pitt puts out every year:

  • Keras

    Deep Learning for humans

  • Project mention: My Favorite DevTools to Build AI/ML Applications! | dev.to | 2024-04-23

    As a beginner, I was looking for something simple and flexible for developing deep learning models and that is when I found Keras. Many AI/ML professionals appreciate Keras for its simplicity and efficiency in prototyping and developing deep learning models, making it a preferred choice, especially for beginners and for projects requiring rapid development.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • superset

    Apache Superset is a Data Visualization and Data Exploration Platform

  • Project mention: Apache Superset | news.ycombinator.com | 2024-02-26

    Superset is absolutely phenomenal. I really hope Microsoft eventually releases all of their customizations they made to it internally to the OS community someday.

    https://www.youtube.com/watch?v=RY0SSvSUkMA

    https://github.com/apache/superset/discussions/20094

  • scikit-learn

    scikit-learn: machine learning in Python

  • Project mention: AutoCodeRover resolves 22% of real-world GitHub in SWE-bench lite | news.ycombinator.com | 2024-04-09

    Thank you for your interest. There are some interesting examples in the SWE-bench-lite benchmark which are resolved by AutoCodeRover:

    - From sympy: https://github.com/sympy/sympy/issues/13643. AutoCodeRover's patch for it: https://github.com/nus-apr/auto-code-rover/blob/main/results...

    - Another one from scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/13070. AutoCodeRover's patch (https://github.com/nus-apr/auto-code-rover/blob/main/results...) modified a few lines below (compared to the developer patch) and wrote a different comment.

    There are more examples in the results directory (https://github.com/nus-apr/auto-code-rover/tree/main/results).

  • Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

  • Project mention: Deploying a Serverless Dash App with AWS SAM and Lambda | dev.to | 2024-03-04

    Dash is a Python framework that enables you to build interactive frontend applications without writing a single line of Javascript. Internally and in projects we like to use it in order to build a quick proof of concept for data driven applications because of the nice integration with Plotly and pandas. For this post, I'm going to assume that you're already familiar with Dash and won't explain that part in detail. Instead, we'll focus on what's necessary to make it run serverless.

  • Made-With-ML

    Learn how to design, develop, deploy and iterate on production-grade ML applications.

  • Project mention: [D] How do you keep up to date on Machine Learning? | /r/learnmachinelearning | 2023-08-13

    Made With ML

  • Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

  • Project mention: Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions | dev.to | 2024-02-12

    Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • streamlit

    Streamlit — A faster way to build and share data apps.

  • Project mention: Creating a Sales Analysis Application with Streamlit: A Practical Approach to Business Intelligence | dev.to | 2024-04-19

    2.-Go to https://streamlit.io, log in, and create a new app from your GitHub repository.

  • Ray

    Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

  • Project mention: Open Source Advent Fun Wraps Up! | dev.to | 2024-01-05

    22. Ray | Github | tutorial

  • gradio

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

  • Project mention: Show HN: Dropbase – Build internal web apps with just Python | news.ycombinator.com | 2023-12-05

    There's also that library all the AI models started using that gives you a public URL to share. After researching it: https://www.gradio.app/ is the link.

    It's used specifically for making simple UIs for machine learning apps. But I guess technically you could use it for anything.

  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

  • Project mention: Step by step guide to create customized chatbot by using spaCy (Python NLP library) | dev.to | 2024-03-23

    Hi Community, In this article, I will demonstrate below steps to create your own chatbot by using spaCy (spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython):

  • AI-Expert-Roadmap

    Roadmap to becoming an Artificial Intelligence Expert in 2022

  • Project mention: Best AI ML DL DS Roadmap | /r/deeplearning | 2023-12-07

    **[I.am.ai AI Expert Roadmap](https://i.am.ai/roadmap)**: This roadmap focuses more on AI and includes various aspects of machine learning and deep learning. It's suitable for those who want to delve deeper into AI, particularly in cutting-edge research and applications.

  • pytorch-lightning

    Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

  • Project mention: Lightning AI Studios – A persistent GPU cloud environment | news.ycombinator.com | 2023-12-14
  • data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

    aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

  • Project mention: Probabilistic Programming and Bayesian Methods for Hackers (2013) | news.ycombinator.com | 2024-02-10
  • Data-Science-For-Beginners

    10 Weeks, 20 Lessons, Data Science for All!

  • Project mention: Welcome to 14 days of Data Science! | dev.to | 2024-03-07

    Get started with Data Science in the Data Science for Beginners curricula.

  • applied-ml

    📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

  • ML-From-Scratch

    Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

  • awesome-datascience

    :memo: An awesome Data Science repository to learn and apply for real world problems.

  • Project mention: About Data analyst, data scientist and data engineer, resources and experiences | dev.to | 2024-03-26

    Awesome Data Science by Academic

  • d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

  • fastbook

    The fastai book, published as Jupyter Notebooks

  • Project mention: The fastai book, published as Jupyter Notebooks | news.ycombinator.com | 2024-01-17
  • dash

    Data Apps & Dashboards for Python. No JavaScript Required.

  • Project mention: dash VS solara - a user suggested alternative | libhunt.com/r/dash | 2023-10-13
  • matplotlib

    matplotlib: plotting with Python

  • Project mention: How and where is matplotlib package making use of PySide? | /r/learnpython | 2023-12-07
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Data Science related posts

Index

What are some of the best open-source Data Science projects? This list will help you:

Project Stars
1 ML-For-Beginners 66,806
2 Keras 60,902
3 superset 58,737
4 scikit-learn 58,046
5 Pandas 41,923
6 Made-With-ML 35,610
7 Airflow 34,397
8 streamlit 31,506
9 Ray 30,988
10 gradio 28,730
11 spaCy 28,704
12 AI-Expert-Roadmap 28,388
13 pytorch-lightning 26,797
14 data-science-ipython-notebooks 26,459
15 Probabilistic-Programming-and-Bayesian-Methods-for-Hackers 26,341
16 Data-Science-For-Beginners 26,290
17 applied-ml 25,875
18 ML-From-Scratch 23,164
19 awesome-datascience 23,101
20 d2l-en 21,628
21 fastbook 20,711
22 dash 20,472
23 matplotlib 19,223

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com