Python Datascience

Open-source Python projects categorized as Datascience

Top 21 Python Datascience Projects

  • ludwig

    Low-code framework for building custom LLMs, neural networks, and other AI models

    Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | | 2024-04-07

    This is a great project, little bit similar to, but it includes testing capabilities and ablation.

    questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?

    Would love to see more progress toward this area!

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • Taipy

    Turns Data and AI algorithms into production-ready web applications in no time.

    Project mention: Python Day 9: Building Interactive Web Apps without HTML/CSS and JavaScript | | 2024-04-26

    Taipy is an open-source Python library that enables data scientists and developers to build robust end-to-end data pipelines.

  • modin

    Modin: Scale your Pandas workflows by changing a single line of code

  • metaflow

    :rocket: Build and manage real-life ML, AI, and data science projects with ease!

    Project mention: 10 Open Source Tools for Building MLOps Pipelines | | 2024-06-06

    Metaflow is an open source Python library that allows engineers to build and manage ML projects. It focuses on rapid prototyping and reducing time from development to production. It makes the job of ML data scientists easier by taking care of the low-level infrastructure: data, compute, orchestration, and versioning.

  • panel

    Panel: The powerful data exploration & web app framework for Python (by holoviz)

    Project mention: This Week In Python | | 2024-04-12

    panel – data exploration & web app framework for Python

  • Mimesis

    Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.

  • PyFunctional

    Python library for creating data pipelines with chain functional programming

    Project mention: Python: Uncovering the Overlooked Core Functionalities | | 2023-07-24

    If you actually think this code is better there's a real library that does this:

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Fast-F1

    FastF1 is a python package for accessing and analyzing Formula 1 results, schedules, timing data and telemetry

  • openllmetry

    Open-source observability for your LLM application, based on OpenTelemetry

    Project mention: Launch HN: Traceloop (YC W23) – Detecting LLM Hallucinations with OpenTelemetry | | 2024-07-17

    2. Our soft-faithfulness metric was able to detect cases in summarization tasks where a model was completely making up stuff that never appeared in the original text.

    One of the challenges we faced was figuring out how to collect the data that we need from our customers' LLM apps. That’s where OpenTelemetry came in handy. We built OpenLLMetry (, and announced it here almost a year ago. It standardized the use of OpenTelemetry to observe LLM apps. We realized that the concepts of traces, spans, metrics, and logs that were standardized with OpenTelemetry can easily extend to gen AI. We partnered with 20+ observability platforms to make sure that OpenLLMetry becomes the standard for GenAI observability and that the data that we collect can be sent to other platforms as well.

    We plan to extend the metrics we provide to support agents that use tools, vision models, and other amazing developments in our fast-paced industry.

    We invite you to give Traceloop a spin and are eager for your feedback! How do you track and debug hallucinations? How much has that been an issue for you? What types of hallucinations have you encountered?

  • CleverCSV

    CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

  • streamlit-geospatial

    A multi-page streamlit app for geospatial

  • DGFraud

    A Deep Graph-based Toolbox for Fraud Detection

  • socios-brasil

    Captura os dados de sócios das empresas brasileiras na Receita Federal e exporta para um formato legível por humanos

  • Mobile-Phone-Dataset-GSMArena

    Python script for creating Mobile Phones Dataset on GSMArena website.

  • gretel-python-client

    The Gretel Python Client allows you to interact with the Gretel REST API.

  • PathDict

    Easily query and modify Python dicts!

  • linkedin-connections-analyzer

    LinkedIn connections analyzer

  • TagMaps

    Spatio-Temporal Tag and Photo Location Clustering for generating Tag Maps

  • scrape-google-play-store-app

    Single script to scrape Google Play Store App info without browser automation

  • OLX-Analytics

    🔍 This project allows easy and efficient browsing of classifieds on the OLX portal. The user has the option to register for a subscription and receive the latest information from the category that interests him every 4 hours.

    Project mention: [Python] Project ideas for every level of advancement | | 2023-10-27

    Stack: Python, Flask, HTML, CSS, Bootstrap, Docker, SQLite, APScheduler Source code

  • Machine-Learning-Cyrillic-Classifier

    This is a web app where you can draw a letter in the russian alphabet and the ML algorithm will predict the letter that you drew.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Datascience discussion

Log in or Post with

Python Datascience related posts

  • Python Day 9: Building Interactive Web Apps without HTML/CSS and JavaScript

    1 project | | 26 Apr 2024
  • +10 Resources to Empower Women in Technology

    1 project | | 6 Mar 2024
  • Show HN: Building data and AI apps, an alternative to Streamlit

    1 project | | 12 Feb 2024
  • Our open-source project for building AI / Data full-stack apps got funded! 🎉 🎉

    1 project | | 15 Jan 2024
  • Plotting 1,000,000 points on a webpage using only Python

    1 project | /r/bigdata | 11 Dec 2023
  • Python: Uncovering the Overlooked Core Functionalities

    3 projects | | 24 Jul 2023
  • Consume Live Timing/Telemetry From API During Race

    1 project | /r/F1Technical | 28 May 2023
  • A note from our sponsor - Scout Monitoring | 22 Jul 2024
    Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →


What are some of the best open-source Datascience projects in Python? This list will help you:

Project Stars
1 ludwig 11,003
2 Taipy 9,807
3 modin 9,623
4 metaflow 7,834
5 panel 4,463
6 Mimesis 4,349
7 PyFunctional 2,372
8 Fast-F1 2,314
9 openllmetry 1,552
10 CleverCSV 1,235
11 streamlit-geospatial 829
12 DGFraud 677
13 socios-brasil 555
14 Mobile-Phone-Dataset-GSMArena 58
15 gretel-python-client 50
16 PathDict 24
17 linkedin-connections-analyzer 12
18 TagMaps 6
19 scrape-google-play-store-app 2
20 OLX-Analytics 1
21 Machine-Learning-Cyrillic-Classifier 1

Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in is all you need to start monitoring your apps. Sign up for our free tier today.