Python Datascience

Open-source Python projects categorized as Datascience

Top 20 Python Datascience Projects

Datascience
  1. Taipy

    Turns Data and AI algorithms into production-ready web applications in no time.

    Project mention: Top 40 Open-source Developer Tools with the Most GitHub Stars | dev.to | 2025-04-20

    GitHub: https://github.com/Avaiga/taipy

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. modin

    Modin: Scale your Pandas workflows by changing a single line of code

  4. metaflow

    Build, Manage and Deploy AI/ML Systems

    Project mention: Show HN: Flow – A Dynamic Task Engine for AI Agents Without DAG | news.ycombinator.com | 2024-12-02

    Interesting! I feel like this is a cross between https://github.com/dagworks-inc/burr (switch state for context) and https://github.com/Netflix/metaflow because the output of the "task" declares its next hop...

  5. openllmetry

    Open-source observability for your LLM application, based on OpenTelemetry

    Project mention: Top Open Source Tools for LLM Observability in 2025 | dev.to | 2025-05-01

    Traceloop is also an open-source project that provides end-to-end tracing for LLM applications. It uses OpenTelemetry standards to offer visibility into the request flow through code, especially in agent-based and multi-step workflows. Traceloop focuses exclusively on tracing and requires an existing OpenTelemetry setup to unlock its full potential.

  6. panel

    Panel: The powerful data exploration & web app framework for Python (by holoviz)

    Project mention: A simple way to explore data through a Tableau-like UI directly in your data app | news.ycombinator.com | 2024-12-30

    If you want to support the Panel project, the easiest way to do this is to give a star on Github: https://github.com/holoviz/panel. Much appreciated. Thanks.

  7. Mimesis

    Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.

    Project mention: Mimesis: The Fake Data Generator That Will Blow Your Mind! | dev.to | 2025-05-08

    View the Project on GitHub

  8. Fast-F1

    FastF1 is a python package for accessing and analyzing Formula 1 results, schedules, timing data and telemetry

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. PyFunctional

    Python library for creating data pipelines with chain functional programming

  11. CleverCSV

    CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

  12. streamlit-geospatial

    A multi-page streamlit app for geospatial

  13. DGFraud

    A Deep Graph-based Toolbox for Fraud Detection

  14. socios-brasil

    Captura os dados de sócios das empresas brasileiras na Receita Federal e exporta para um formato legível por humanos

  15. Mobile-Phone-Dataset-GSMArena

    Python script for creating Mobile Phones Dataset on GSMArena website.

  16. gretel-python-client

    The Gretel Python Client allows you to interact with the Gretel REST API.

  17. PathDict

    Easily query and modify Python dicts!

  18. linkedin-connections-analyzer

    LinkedIn connections analyzer

  19. TagMaps

    Spatio-Temporal Tag and Photo Location Clustering for generating Tag Maps

  20. OLX-Analytics

    🔍 This project allows easy and efficient browsing of classifieds on the OLX portal. The user has the option to register for a subscription and receive the latest information from the category that interests him every 4 hours.

  21. Machine-Learning-Cyrillic-Classifier

    This is a web app where you can draw a letter in the russian alphabet and the ML algorithm will predict the letter that you drew.

  22. scrape-google-play-store-app

    Single script to scrape Google Play Store App info without browser automation

  23. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Datascience discussion

Log in or Post with

Python Datascience related posts

  • Start contributing to a Popular Open Source Project

    2 projects | dev.to | 28 Jan 2025
  • Build a Stock Dashboard in less than 40 lines of Python code!🤓

    1 project | dev.to | 5 Dec 2024
  • 🤓 Top 12 Open Source Repositories to Watch in 2025 to become the ultimate developer

    1 project | dev.to | 2 Dec 2024
  • 9 Open-Source Python Tools to Build Better Data Apps in 2025

    1 project | dev.to | 18 Nov 2024
  • Python Day 9: Building Interactive Web Apps without HTML/CSS and JavaScript

    1 project | dev.to | 26 Apr 2024
  • +10 Resources to Empower Women in Technology

    1 project | dev.to | 6 Mar 2024
  • Show HN: Building data and AI apps, an alternative to Streamlit

    1 project | news.ycombinator.com | 12 Feb 2024
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 12 Jun 2025
    InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →

Index

What are some of the best open-source Datascience projects in Python? This list will help you:

# Project Stars
1 Taipy 18,114
2 modin 10,185
3 metaflow 8,877
4 openllmetry 5,943
5 panel 5,257
6 Mimesis 4,584
7 Fast-F1 3,135
8 PyFunctional 2,451
9 CleverCSV 1,296
10 streamlit-geospatial 940
11 DGFraud 723
12 socios-brasil 584
13 Mobile-Phone-Dataset-GSMArena 59
14 gretel-python-client 56
15 PathDict 25
16 linkedin-connections-analyzer 12
17 TagMaps 8
18 OLX-Analytics 2
19 Machine-Learning-Cyrillic-Classifier 1
20 scrape-google-play-store-app 1

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?