Python Datascience

Open-source Python projects categorized as Datascience

Top 19 Python Datascience Projects

  • ludwig

    Low-code framework for building custom LLMs, neural networks, and other AI models

    Project mention: Python projects with best practices on Github? | /r/Python | 2023-02-14

    Two random examples I found from 30 seconds of googling: Here’s Netflix using it in their crisis management tool, and here’s Uber using it in their deep learning framework.

  • modin

    Modin: Scale your Pandas workflows by changing a single line of code

    Project mention: The Distributed Tensor Algebra Compiler (2022) | news.ycombinator.com | 2023-06-15
  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • metaflow

    :rocket: Build and manage real-life data science projects with ease!

    Project mention: In Need of Guidance: Implementing MLOps in a Complex Organization as a Junior Data Engineer | /r/mlops | 2023-06-12
  • Mimesis

    Mimesis is a powerful Python library that empowers developers to generate massive amounts of synthetic data efficiently.

    Project mention: Mimesis allows you toeasily generate detailed dummy datasets. | /r/datascience | 2023-04-12

    Mimesis has well-structured and comprehensive documentation: https://mimesis.name

  • panel

    Panel: The powerful data exploration & web app framework for Python (by holoviz)

    Project mention: What python library you are using for interactive visualisation?(other than plotly) | /r/datascience | 2023-06-01

    https://panel.holoviz.org/ It's a web app framework for Python similar to what Dash does for plotly. It plays nicely with bokeh visuals and I think the front-end is built using bokeh css elements.

  • PyFunctional

    Python library for creating data pipelines with chain functional programming

    Project mention: Python: Uncovering the Overlooked Core Functionalities | news.ycombinator.com | 2023-07-24

    If you actually think this code is better there's a real library that does this: https://github.com/EntilZha/PyFunctional.

  • Fast-F1

    FastF1 is a python package for accessing and analyzing Formula 1 results, schedules, timing data and telemetry

    Project mention: Consume Live Timing/Telemetry From API During Race | /r/F1Technical | 2023-05-28

    F1 broadcasts their live timing via the SignalR protocol. The endpoint itself is unauthenticated. You can look at FastF1’s implementation of the SignalR client and the respective endpoints which it connects to within the code documentation here FastF1 SignalR client

  • InfluxDB

    Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.

  • CleverCSV

    CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

    Project mention: Parquet: more than just "Turbo CSV" | /r/programming | 2023-04-03

    There’s things like this, but I consider the existence of messy, non standard CSV files (backed by a decade of experience dealing with the problem) a strong reason to not use the format ever.

  • streamlit-geospatial

    A multi-page streamlit app for geospatial

    Project mention: how i can create a timelapse of a specfic region | /r/remotesensing | 2023-07-05
  • DGFraud

    A Deep Graph-based Toolbox for Fraud Detection

  • socios-brasil

    Captura os dados de sócios das empresas brasileiras na Receita Federal e exporta para um formato legível por humanos

  • objectiv-analytics

    Open-source product analytics infrastructure for data teams that want full control. Built for high quality data collection and ready to use for advanced analytics & ML.

  • Mobile-Phone-Dataset-GSMArena

    Python script for creating Mobile Phones Dataset on GSMArena website.

  • gretel-python-client

    The Gretel Python Client allows you to interact with the Gretel REST API.

  • viper

    Simple, expressive pipeline syntax to transform and manipulate data with ease (by aropele)

    Project mention: Simple, expressive pipeline syntax to transform and manipulate data with ease | news.ycombinator.com | 2023-01-24
  • TagMaps

    Spatio-Temporal Tag and Photo Location Clustering for generating Tag Maps

    Project mention: Stop lying to yourself – you “fix it later” | news.ycombinator.com | 2022-11-20

    I think this is perfectly natural - some ideas never manifest or are convincing enough to publish, or sometimes you write code and it turns out not to be used - why produce production ready code in this case. I have a python package that I slowly developed over 5 years, step by step [1]. Everytime I use it, I find many things that I could develop, some I do right then, others I leave for later. I also have a blog [2] - you can see three dates for each blog post:

    - the time I first started working on it

    - the first time I published it

    - the last time it was updated

    All of these dates are important. Think of doing things more like a process of chained events, not like a one-stop thing.

    [1]: https://github.com/Sieboldianus/TagMaps

    [2]: https://du.nkel.dev/

  • linkedin-connections-analyzer

    LinkedIn connections analyzer

  • scrape-google-play-store-app

    Single script to scrape Google Play Store App info without browser automation

    Project mention: Web Scraping All Google Play App Reviews in Python | dev.to | 2022-09-26

    GitHub Repository

  • Machine-Learning-Cyrillic-Classifier

    This is a web app where you can draw a letter in the russian alphabet and the ML algorithm will predict the letter that you drew.

  • Mergify

    Tired of breaking your main and manually rebasing outdated pull requests?. Managing outdated pull requests is time-consuming. Mergify's Merge Queue automates your pull request management & merging. It's fully integrated to GitHub & coordinated with any CI. Start focusing on code. Try Mergify for free.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-07-24.

Python Datascience related posts

Index

What are some of the best open-source Datascience projects in Python? This list will help you:

Project Stars
1 ludwig 9,745
2 modin 8,967
3 metaflow 6,977
4 Mimesis 4,045
5 panel 3,120
6 PyFunctional 2,229
7 Fast-F1 1,817
8 CleverCSV 1,119
9 streamlit-geospatial 699
10 DGFraud 591
11 socios-brasil 542
12 objectiv-analytics 461
13 Mobile-Phone-Dataset-GSMArena 52
14 gretel-python-client 38
15 viper 14
16 TagMaps 6
17 linkedin-connections-analyzer 5
18 scrape-google-play-store-app 2
19 Machine-Learning-Cyrillic-Classifier 1
Tired of breaking your main and manually rebasing outdated pull requests?
Managing outdated pull requests is time-consuming. Mergify's Merge Queue automates your pull request management & merging. It's fully integrated to GitHub & coordinated with any CI. Start focusing on code. Try Mergify for free.
blog.mergify.com