Python Data Science

Open-source Python projects categorized as Data Science | Edit details

Top 23 Python Data Science Projects

  • GitHub repo Keras

    Deep Learning for humans

    Project mention: Data Science with JavaScript: What we've learned so far? | | 2021-09-09
  • GitHub repo scikit-learn

    scikit-learn: machine learning in Python

    Project mention: Scikit-Learn Version 1.0 | | 2021-09-14

    Just to clarify, scikit-learn 1.0 has not been released yet. The latest tag in the github repo is 1.0.rc2


    Optimize your datasets for ML. Goodbye, boilerplate code - the fastest dataset optimization and management tool for computer vision.

  • GitHub repo superset

    Apache Superset is a Data Visualization and Data Exploration Platform

    Project mention: Does anyone have experience with live dashboards? | | 2021-10-15
  • GitHub repo data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

    Project mention: Beginner in Python for Data Science | | 2020-12-27

    data science ipython notebooks

  • GitHub repo spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: I put together a tutorial and overview on how to use DeepSpeech to do Speech Recognition in Python | | 2021-10-14

    It definitely could - with the real-time speech recognition example shown in the tutorial. But you'd likely need some sort of NLU running after the transcription is performed - to basically parse what was spoken into a command that you can use to run some business logic. There are some good open source libs for this too like

  • GitHub repo Ray

    An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

    Project mention: How to deploy a rllib-trained model? | | 2021-10-16
  • GitHub repo streamlit

    Streamlit — The fastest way to build data apps in Python

    Project mention: Sharing results with other people | | 2021-10-11

    This might be a good place to use something like streamlit ( Turn you notebook into a script, have them upload the file (or have them enter a network path). Run your code over the file, and return the file. Streamlit can run on your local machine, or on some other local machine, or streamlit can host it

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo pytorch-lightning

    The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

    Project mention: [P] An introduction to PyKale​, a PyTorch library that provides a unified pipeline-based API for knowledge-aware multimodal learning and transfer learning on graphs, images, texts, and videos to accelerate interdisciplinary research. Welcome feedback/contribution! | | 2021-04-25

    If you want a good example for reference, take a look at Pytorch Lightning's readme ( It answers the 3 questions of "what is this", "why should I care", and "how do i use it" almost instantly

  • GitHub repo dash

    Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

    Project mention: Is it just a coincidence or do most programmers prefer Firefox over Chrome? | | 2021-10-14
  • GitHub repo ipython

    Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.

    Project mention: A resource for looking 'under the hood' of multi-processing in python | | 2021-04-19

    I stand corrected... Apparently, it does matter, if you are on Windows...

  • GitHub repo gensim

    Topic Modelling for Humans

    Project mention: The unthinking application of this regex-efficiency check wasted our attention | | 2021-09-30
  • GitHub repo recommenders

    Best Practices on Recommendation Systems

    Project mention: Opinion on choice of model - Recommender System | | 2021-04-10

    Then I tried to find some more advanced models and I found this really good list and in there I found the Microsoft one. So it's' where we are now, which a bunch of different models and not a documentation/tutorials out there.

  • GitHub repo d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities.

    Project mention: I created a way to learn machine learning through Jupyter | | 2021-04-30

    There are actually some online books and courses built on Jupyter Notebook ([Dive to Deep Learning Book]( for example). However yours is more detail and could really helps beginners.

  • GitHub repo allennlp

    An open-source NLP research library, built on PyTorch.

    Project mention: Any allennlp users in this sub? | | 2021-10-08 looks active

  • GitHub repo nni

    An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

    Project mention: Automated Machine Learning (AutoML) - 9 Different Ways with Microsoft AI | | 2021-10-04

    For a complete tutorial, navigate to this Jupyter Notebook:

  • GitHub repo TFLearn

    Deep learning library featuring a higher-level API for TensorFlow.

    Project mention: Base ball | | 2021-03-20

    Both the teams in a game are given their individual ID values and are made into vectors. Relevant data like the home and away team, home runs, RBI’s, and walk’s are all taken into account and passed through layers. There’s no need to reinvent the wheel here, there's a multitude of libraries that enable a coder to implement machine learning theories efficiently. In this case we will be using a library called TFlearn, documentation available from The program will output the home and away teams as well as their respective score predictions.

  • GitHub repo seaborn

    Statistical data visualization in Python

    Project mention: Series Intro: Data Visualization With Svelte and D3 | | 2021-10-08

    Curran Kelleher's Data Visualization With React and D3 is probably the most comprehensive course on D3 on the Internet. For some time I have been curious about D3 and data visualization in general. While I have worked with Python libraries like Matplotlib , Seaborn and Plotly and they are fantastic for most kinds of exploratory and scientifc plots they have limitations on the web particularly in terms of interactivity. D3, for the uinitiated is the library which powers the bulk of all the interactive and SVG based plots on the web. It also acts as underlying low-level library for a lot of higher level libraries including Vega(For a more comprehensive list look here ).

  • GitHub repo dvc

    🦉Data Version Control | Git for Data & Models | ML Experiments Management

    Project mention: [D] How do you ensure reproducibility? | | 2021-09-24

    You'll want to add some reproducibility at the data layer, and several libraries exist, such as dvc (,

  • GitHub repo Prefect

    The easiest way to automate your data

    Project mention: My first Hacktoberfest !!✨ | | 2021-10-15

    View on GitHub

  • GitHub repo machine_learning_examples

    A collection of machine learning examples and tutorials.

    Project mention: How to save an attention model for deployment/exposing to an API? | | 2021-08-17

    I've been following a course teaching how to make an attention model for neural machine translation, This is the file inside the repo. I know that I'll have to use certain functions to make the textual input be processed in encodings and tokens, but those functions use certain instances of the model, which I don't know if I should keep or not. If anyone can please take a look and help me out here, it'd be really really appreciated.

  • GitHub repo best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

    Project mention: Awesome list of ML | | 2021-09-16
  • GitHub repo data-science-blogs

    A curated list of data science blogs

    Project mention: ⚙️ Data Science Blogs: A vast collection of #blogs about #DataScience. h/t @Sauain | | 2021-09-13
  • GitHub repo boltons

    🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-10-16.


What are some of the best open-source Data Science projects in Python? This list will help you:

Project Stars
1 Keras 52,841
2 scikit-learn 47,570
3 superset 40,869
4 data-science-ipython-notebooks 21,689
5 spaCy 21,511
6 Ray 17,740
7 streamlit 16,223
8 pytorch-lightning 15,654
9 dash 15,260
10 ipython 15,019
11 gensim 12,543
12 recommenders 11,459
13 d2l-en 11,202
14 allennlp 10,545
15 nni 10,368
16 TFLearn 9,570
17 seaborn 8,830
18 dvc 8,726
19 Prefect 7,544
20 machine_learning_examples 6,352
21 best-of-ml-python 5,815
22 data-science-blogs 5,714
23 boltons 5,610
Find remote jobs at our new job board There are 34 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.