Data Science

Open-source projects categorized as Data Science | Edit details

Top 23 Data Science Open-Source Projects

  • GitHub repo Keras

    Deep Learning for humans

    Project mention: 5 ways to keep your skills fresh after finishing a coding bootcamp | | 2021-11-28

    One way to improve your projects and coding skills is to try new models and libraries. For example, if you did classification with logistic regression, try also with random forest; if you used Tensorflow, now try Keras; if you scraped a website with BeautifulSoup, now do it with Scrapy. You get the point.

  • GitHub repo scikit-learn

    scikit-learn: machine learning in Python

    Project mention: Data Science toolset summary from 2021 | | 2021-11-13

    Scikit-learn - It is one of the most widely used frameworks for Python based Data science tasks. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Link -

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo superset

    Apache Superset is a Data Visualization and Data Exploration Platform

    Project mention: Open source Business intelligence platform made with Python | | 2021-12-01
  • GitHub repo MadeWithML

    Learn how to responsibly deliver value with ML.

    Project mention: New to mlops, where do I need to start | | 2021-11-01

    Standing recommendation for beginners (we should eventually make a wiki) is

  • GitHub repo ML-For-Beginners

    12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

    Project mention: Looking for buddy/mentor to study ML/datascience, with some focus on practical skills like cloud training and deployment | | 2021-11-29

    I’m looking for a buddy to study the materials from these 2 Microsoft courses (or at least similar topics): - DS4Beginners - ML4Beginners

  • GitHub repo Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

    aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

    Project mention: [Q] What masters degree should I apply for if I want to do more programming later on? | | 2021-11-23

    The key to all of this is finding some area of programming you like. For me it was simulation and Bayesian methods.

  • GitHub repo data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

    Project mention: Beginner in Python for Data Science | | 2020-12-27

    data science ipython notebooks

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: Two Methods to Scan for PII in Data Warehouses | | 2021-11-29

    NLP libraries such as Stanford NER Detector and Spacy

  • GitHub repo Ray

    An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

    Project mention: JORLDY: OpenSource Reinforcement Learning Framework | | 2021-11-08

    Distributed RL algorithms are provided using ray

  • GitHub repo applied-ml

    📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

    Project mention: Messed up my career by pivoting to DS. Wondering if it's too late to switch to MLE | | 2021-11-17

    Applied ML: A collection of papers, articles, and blogs on ML in production by different companies (Netflix, Uber, Facebook, LinkedIn, etc)

  • GitHub repo awesome-datascience

    :memo: An awesome Data Science repository to learn and apply for real world problems.

    Project mention: ⚙️ Awesome Data Science: An #OpenSource #DataScience repository to learn and apply towards solving real world problems. h/t @Sauain | | 2021-10-16
  • GitHub repo streamlit

    Streamlit — The fastest way to build data apps in Python

    Project mention: Suggestions for GUI framework for an app to browse tables of data, with buttons and dropdown menus in cells? And some related PySimpleGui questions | | 2021-11-07

    I've never used it, but someone suggested it in another thread, and it looked interesting to me, so I have it bookmarked to try-out:

  • GitHub repo pytorch-lightning

    The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

    Project mention: [D] Colab TPU low performance | | 2021-11-18

    I wanted to make a quick performance comparison between the GPU (Tesla K80) and TPU (v2-8) available in Google Colab with PyTorch. To do so quickly, I used an MNIST example from pytorch-lightning that trains a simple CNN.

  • GitHub repo dash

    Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

    Project mention: How can i create a cloud interface using python? | | 2021-11-27
  • GitHub repo AI-Expert-Roadmap

    Roadmap to becoming an Artificial Intelligence Expert in 2021

    Project mention: What are some non-web dev coding jobs? | | 2021-09-10

    AI roadmap

  • GitHub repo ipython

    Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.

    Project mention: Using IPython embed() to change the state of the program, inside functions | | 2021-11-03

    I found this old discussion while trying to sort it out, but it doesn't help your situation.

  • GitHub repo fastbook

    The fastai book, published as Jupyter Notebooks

    Project mention: What is the most useful free course you have taken? | | 2021-11-23

    fastai it was a game changer for me also there is a lot of additional blogs from students that flesh the material even more, pretty active community with forums and discord server and very useful book that is great as reference (the book available as jupyter notebooks

  • GitHub repo gensim

    Topic Modelling for Humans

    Project mention: Gensim – a Python library for topic modelling, document indexing | | 2021-11-25
  • GitHub repo stanford-cs-229-machine-learning

    VIP cheatsheets for Stanford's CS 229 Machine Learning

    Project mention: Stanford University Probabilities and Statistics refresher | | 2021-03-24
  • GitHub repo Awesome-pytorch-list

    A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.

    Project mention: Similar open source long library list to TF like Pytorch "ECOSYSTEM TOOLS" | | 2021-11-19

    I got the following as recombination from elsewhere - and there is one for pt as well . Thx for the help :D

  • GitHub repo recommenders

    Best Practices on Recommendation Systems

    Project mention: Opinion on choice of model - Recommender System | | 2021-04-10

    Then I tried to find some more advanced models and I found this really good list and in there I found the Microsoft one. So it's' where we are now, which a bunch of different models and not a documentation/tutorials out there.

  • GitHub repo d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 300 universities from 55 countries including Stanford, MIT, Harvard, and Cambridge.

    Project mention: I created a way to learn machine learning through Jupyter | | 2021-04-30

    There are actually some online books and courses built on Jupyter Notebook ([Dive to Deep Learning Book]( for example). However yours is more detail and could really helps beginners.

  • GitHub repo dive-into-machine-learning

    Dive into Machine Learning with Python Jupyter notebook and scikit-learn! First posted in 2016, maintained as of 2021. Pull requests welcome.

    Project mention: dive-into-machine-learning: NEW Courses - star count:10764.0 | | 2021-11-08
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-12-01.

Data Science related posts


What are some of the best open-source Data Science projects? This list will help you:

Project Stars
1 Keras 53,329
2 scikit-learn 48,081
3 superset 42,077
4 MadeWithML 29,164
5 ML-For-Beginners 27,641
6 Probabilistic-Programming-and-Bayesian-Methods-for-Hackers 23,835
7 data-science-ipython-notebooks 21,875
8 spaCy 21,827
9 Ray 18,270
10 applied-ml 17,725
11 awesome-datascience 17,459
12 streamlit 16,661
13 pytorch-lightning 16,408
14 dash 15,469
15 AI-Expert-Roadmap 15,320
16 ipython 15,075
17 fastbook 13,911
18 gensim 12,694
19 stanford-cs-229-machine-learning 12,643
20 Awesome-pytorch-list 12,426
21 recommenders 11,705
22 d2l-en 11,569
23 dive-into-machine-learning 10,779
Find remote jobs at our new job board There are 33 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives