Python Data Science

Open-source Python projects categorized as Data Science | Edit details

Top 23 Python Data Science Projects

  • GitHub repo Keras

    Deep Learning for humans

    Project mention: 5 ways to keep your skills fresh after finishing a coding bootcamp | | 2021-11-28

    One way to improve your projects and coding skills is to try new models and libraries. For example, if you did classification with logistic regression, try also with random forest; if you used Tensorflow, now try Keras; if you scraped a website with BeautifulSoup, now do it with Scrapy. You get the point.

  • GitHub repo scikit-learn

    scikit-learn: machine learning in Python

    Project mention: Data Science toolset summary from 2021 | | 2021-11-13

    Scikit-learn - It is one of the most widely used frameworks for Python based Data science tasks. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Link -

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

    Project mention: Beginner in Python for Data Science | | 2020-12-27

    data science ipython notebooks

  • GitHub repo spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: We created a step-by-step guide about how spaCy v3's configuration and project systems can help you enhance your Natural Language Processing workflows! | | 2021-11-17

    spaCy on GitHub

  • GitHub repo Ray

    An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

    Project mention: JORLDY: OpenSource Reinforcement Learning Framework | | 2021-11-08

    Distributed RL algorithms are provided using ray

  • GitHub repo streamlit

    Streamlit — The fastest way to build data apps in Python

    Project mention: Suggestions for GUI framework for an app to browse tables of data, with buttons and dropdown menus in cells? And some related PySimpleGui questions | | 2021-11-07

    I've never used it, but someone suggested it in another thread, and it looked interesting to me, so I have it bookmarked to try-out:

  • GitHub repo pytorch-lightning

    The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

    Project mention: [D] Colab TPU low performance | | 2021-11-18

    I wanted to make a quick performance comparison between the GPU (Tesla K80) and TPU (v2-8) available in Google Colab with PyTorch. To do so quickly, I used an MNIST example from pytorch-lightning that trains a simple CNN.

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo dash

    Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

    Project mention: How can i create a cloud interface using python? | | 2021-11-27
  • GitHub repo ipython

    Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.

    Project mention: Using IPython embed() to change the state of the program, inside functions | | 2021-11-03

    I found this old discussion while trying to sort it out, but it doesn't help your situation.

  • GitHub repo gensim

    Topic Modelling for Humans

    Project mention: Gensim – a Python library for topic modelling, document indexing | | 2021-11-25
  • GitHub repo recommenders

    Best Practices on Recommendation Systems

    Project mention: Opinion on choice of model - Recommender System | | 2021-04-10

    Then I tried to find some more advanced models and I found this really good list and in there I found the Microsoft one. So it's' where we are now, which a bunch of different models and not a documentation/tutorials out there.

  • GitHub repo d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 300 universities from 55 countries including Stanford, MIT, Harvard, and Cambridge.

    Project mention: I created a way to learn machine learning through Jupyter | | 2021-04-30

    There are actually some online books and courses built on Jupyter Notebook ([Dive to Deep Learning Book]( for example). However yours is more detail and could really helps beginners.

  • GitHub repo allennlp

    An open-source NLP research library, built on PyTorch.

    Project mention: Cedille, the largest French language model, open source with a freely accessible playground | | 2021-11-12
  • GitHub repo nni

    An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

    Project mention: Automated Machine Learning (AutoML) - 9 Different Ways with Microsoft AI | | 2021-10-04

    For a complete tutorial, navigate to this Jupyter Notebook:

  • GitHub repo TFLearn

    Deep learning library featuring a higher-level API for TensorFlow.

    Project mention: Base ball | | 2021-03-20

    Both the teams in a game are given their individual ID values and are made into vectors. Relevant data like the home and away team, home runs, RBI’s, and walk’s are all taken into account and passed through layers. There’s no need to reinvent the wheel here, there's a multitude of libraries that enable a coder to implement machine learning theories efficiently. In this case we will be using a library called TFlearn, documentation available from The program will output the home and away teams as well as their respective score predictions.

  • GitHub repo seaborn

    Statistical data visualization in Python

    Project mention: Matplotlib for sabermetric analysis | | 2021-11-27

    seaborn is the standard for general statistical plotting, again built on top of matplotlib. Note that its regression plots, etc., don’t allow you to recover their parameters, so always use e.g. statsmodels for final work

  • GitHub repo dvc

    🦉Data Version Control | Git for Data & Models | ML Experiments Management

    Project mention: [D] 5 considerations for Deploying Machine Learning Models in Production – what did I miss? | | 2021-11-21

    Consideration Number #2: Consider using model life cycle development and management platforms like MLflow, DVC, Weights & Biases, or SageMaker Studio. And Ray, Ray Tune, Ray Train (formerly Ray SGD), PyTorch and TensorFlow for distributed, compute-intensive and deep learning ML workloads.

  • GitHub repo Prefect

    The easiest way to automate your data

    Project mention: Prefect CLI Action | | 2021-11-21

    GitHub Action for running Prefect commands using the Prefect CLI.

  • GitHub repo machine_learning_examples

    A collection of machine learning examples and tutorials.

    Project mention: How to save an attention model for deployment/exposing to an API? | | 2021-08-17

    I've been following a course teaching how to make an attention model for neural machine translation, This is the file inside the repo. I know that I'll have to use certain functions to make the textual input be processed in encodings and tokens, but those functions use certain instances of the model, which I don't know if I should keep or not. If anyone can please take a look and help me out here, it'd be really really appreciated.

  • GitHub repo best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

    Project mention: Awesome list of ML | | 2021-09-16
  • GitHub repo data-science-blogs

    A curated list of data science blogs

    Project mention: ⚙️ Data Science Blogs: A vast collection of #blogs about #DataScience. h/t @Sauain | | 2021-09-13
  • GitHub repo great_expectations

    Always know what to expect from your data.

    Project mention: great_expectations VS redata - a user suggested alternative | | 2021-09-24
  • GitHub repo boltons

    🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-11-28.

Python Data Science related posts


What are some of the best open-source Data Science projects in Python? This list will help you:

Project Stars
1 Keras 53,272
2 scikit-learn 48,081
3 data-science-ipython-notebooks 21,842
4 spaCy 21,827
5 Ray 18,270
6 streamlit 16,661
7 pytorch-lightning 16,326
8 dash 15,469
9 ipython 15,075
10 gensim 12,670
11 recommenders 11,705
12 d2l-en 11,569
13 allennlp 10,639
14 nni 10,603
15 TFLearn 9,571
16 seaborn 8,949
17 dvc 8,897
18 Prefect 7,821
19 machine_learning_examples 6,425
20 best-of-ml-python 5,921
21 data-science-blogs 5,726
22 great_expectations 5,702
23 boltons 5,657
Find remote jobs at our new job board There are 34 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives