Python Data Analysis

Open-source Python projects categorized as Data Analysis

Top 23 Python Data Analysis Projects

Data Analysis
  1. scikit-learn

    scikit-learn: machine learning in Python

    Project mention: 🚀 Launching a High-Performance DistilBERT-Based Sentiment Analysis Model for Steam Reviews 🎮🤖 | dev.to | 2024-12-16

    scikit-learn (optional): Useful for additional training or evaluation tasks.

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

    Project mention: Building a Sarcasm Detection System with LSTM and GloVe: A Complete Guide | dev.to | 2025-01-02

    Pandas

  4. streamlit

    Streamlit — A faster way to build and share data apps.

    Project mention: Building a Sarcasm Detection System with LSTM and GloVe: A Complete Guide | dev.to | 2025-01-02

    Streamlit

  5. gradio

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

    Project mention: Show HN: I made a website to semantically search ArXiv papers | news.ycombinator.com | 2024-12-24
  6. best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

    Project mention: Top Github repositories for 10+ programming languages | dev.to | 2024-07-16

    Best of ml python

  7. airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

    Project mention: Can AI finally generate best practice code? I think so. | dev.to | 2024-12-19

    Let’s say I’m using Cursor to build a bunch of data apps and using Airbyte as the data movement platform and Streamlit for the frontend. I’m writing in Python and using the Airbyte API libraries. This is my basic ‘tech stack’.

  8. pandas-ai

    Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

    Project mention: Using RAG to Build Your IDE Agents | dev.to | 2024-06-18

    In this blog, we will build a powerful IDE agent for PandasAI using Dash Agent. Then later on, we'll understand how using RAG can significantly improve LLM responses.

  9. pygwalker

    PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis

    Project mention: A simple way to explore data through a Tableau-like UI directly in your data app | news.ycombinator.com | 2024-12-30

    I believe this is just a wrapper around pygwalker, which is a nice project: https://github.com/Kanaries/pygwalker

    I really like the typescript graphic walker: https://github.com/Kanaries/graphic-walker

  10. ydata-profiling

    1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

  11. statsmodels

    Statsmodels: statistical modeling and econometrics in Python

    Project mention: The Truth About Linear Regression | news.ycombinator.com | 2024-07-30

    statsmodels is the closest thing in python to R. statsmodels has mixed model support, but mgcv apparently requires more. It is well above my paygrade, but this seems relevant: https://github.com/statsmodels/statsmodels/issues/8029 (i.e. no out of the box support, you might be able to build an approximation on your own).

  12. akshare

    AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库 (by akfamily)

  13. mlcourse.ai

    Open Machine Learning Course

  14. marimo

    A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.

    Project mention: Show HN: WASM-powered codespaces for Python notebooks on GitHub | news.ycombinator.com | 2025-01-14
  15. pyod

    A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques

  16. imbalanced-learn

    A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

  17. knowledge-repo

    A next-generation curated knowledge sharing platform for data scientists and other technical professions.

  18. TaskWeaver

    A code-first agent framework for seamlessly planning and executing data analytics tasks.

    Project mention: TaskWeaver: Code-first agent framework for seamlessly planning | news.ycombinator.com | 2024-03-10
  19. plotnine

    A Grammar of Graphics for Python

    Project mention: FLaNK AI Weekly 18 March 2024 | dev.to | 2024-03-18
  20. missingno

    Missing data visualization module for Python.

  21. AWS Data Wrangler

    pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

  22. running_page

    Make your own running home page

  23. python-mini-project

    🙌 Welcome open-source Python mini-project contributions!

  24. igel

    a delightful machine learning tool that allows you to train, test, and use models without writing code

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Data Analysis discussion

Log in or Post with

Python Data Analysis related posts

  • Minimal Rio Intro

    2 projects | dev.to | 2 Jan 2025
  • Building a Sarcasm Detection System with LSTM and GloVe: A Complete Guide

    4 projects | dev.to | 2 Jan 2025
  • A simple way to explore data through a Tableau-like UI directly in your data app

    5 projects | news.ycombinator.com | 30 Dec 2024
  • Fixing timestamp overflow error in Python

    1 project | news.ycombinator.com | 30 Dec 2024
  • I built a data pipeline tool in Go

    3 projects | dev.to | 23 Dec 2024
  • Can AI finally generate best practice code? I think so.

    2 projects | dev.to | 19 Dec 2024
  • Show HN: I built an open-source data pipeline tool in Go

    6 projects | news.ycombinator.com | 17 Dec 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 19 Jan 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Data Analysis projects in Python? This list will help you:

# Project Stars
1 scikit-learn 60,790
2 Pandas 44,267
3 streamlit 36,771
4 gradio 35,248
5 best-of-ml-python 18,766
6 airbyte 16,935
7 pandas-ai 13,970
8 pygwalker 13,701
9 ydata-profiling 12,652
10 statsmodels 10,349
11 akshare 10,038
12 mlcourse.ai 9,862
13 marimo 9,504
14 pyod 8,748
15 imbalanced-learn 6,892
16 knowledge-repo 5,499
17 TaskWeaver 5,457
18 plotnine 4,105
19 missingno 3,999
20 AWS Data Wrangler 3,965
21 running_page 3,721
22 python-mini-project 3,250
23 igel 3,096

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?