Python Data Analysis

Open-source Python projects categorized as Data Analysis

Top 23 Python Data Analysis Projects

Data Analysis
  1. scikit-learn

    scikit-learn: machine learning in Python

    Project mention: Detecting Ingress Tool Transfer (T1105) with Python | dev.to | 2026-05-31

    certutil.exe or notepad.exe opening an external connection lands in rare because, fleet-wide, those processes almost never egress. Tune the <= 3 threshold to your environment size. For a more principled version, score each (process, destination) pair by frequency and treat the long tail as the hunt queue, which is the same idea behind scikit-learn's rarity-based anomaly methods without the model overhead.

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. TrendRadar

    ⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。

    Project mention: Daily AI & Automation Tech News - November 20, 2025 | dev.to | 2025-11-19

    Link: https://github.com/sansan0/TrendRadar

  4. Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

    Project mention: MLOps Lifecycle: Stages, Workflow, and Best Practices | dev.to | 2026-06-02

    Feature transformations should be deterministic: The same input should produce the same output when the same feature definition and configuration are applied. This is what allows training, backtesting, and live inference to remain aligned. Tools such as Pandas, Spark, or feature platforms such as Feast can be used to implement that logic.

  5. streamlit

    Streamlit — A faster way to build and share data apps.

    Project mention: 16 Python Libraries You Should Know | dev.to | 2026-05-21

    Streamlit

  6. gradio

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

    Project mention: Add Authentication and SSO to Your Gradio App | dev.to | 2026-03-25

    Gradio is an open source Python package that allows you to create web-based interfaces for AI models, APIs, or any Python function. Its simplicity and flexibility make it a popular choice among developers who want to quickly prototype and deploy web-based interfaces without worrying about frontend development.

  7. BettaFish

    微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。

    Project mention: BettaFish – Public Opinion Sentiment Analysis Model | news.ycombinator.com | 2025-11-03
  8. best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

  9. pandas-ai

    Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

    Project mention: 📰 All Data and AI Weekly #231-02March2026 | dev.to | 2026-03-02

    Pandas-AI: Talk to your dataframes in natural language.

  10. airbyte

    Open-source data movement for ELT pipelines and AI agents — from APIs, databases & files to warehouses, lakes, and AI applications. Both self-hosted and Cloud.

    Project mention: Show HN: Airbyte Agents – context for agents across multiple data sources | news.ycombinator.com | 2026-05-05
  11. marimo

    A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor.

    Project mention: Pluto.jl 1.0 release – reactive notebook for Julia | news.ycombinator.com | 2026-06-03

    Pluto is great. I use it all the time. If you like the reactivity/reproducibility but are wedded to Python, you might want to check out Marimo, which is also great. [https://marimo.io/]

    It too puts the output of a cell above the code so if you're unable to adapt to things that are different it's also probably not for you.

    FWIW, Observable's Notebooks (Javascript) work the same way: output above the code that produces it. [https://observablehq.com/]

    I too did not like having the output above the code but got over it pretty quickly. For plots, it's arguably better: usually, I want to see the plot before I see the 15 line invocation of some plot command. The thing that bugs me the most about Pluto now is that it really wants you to only have a single evaluating statement per cell. You have to wrap stuff in "block......end" if you want to e.g. define more than one variable in a cell.

  12. akshare

    AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库 (by akfamily)

  13. pygwalker

    PyGWalker: Turn your dataframe into an interactive UI for visual analysis

  14. fg-data-profiling

    1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

  15. statsmodels

    Statsmodels: statistical modeling and econometrics in Python

  16. mlcourse.ai

    Open Machine Learning Course

  17. pyod

    A Python library for anomaly detection across tabular, time series, graph, text, and image data. 60+ detectors, benchmark-backed ADEngine orchestration, and an agentic workflow for AI agents.

  18. imbalanced-learn

    A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

  19. knowledge-repo

    A next-generation curated knowledge sharing platform for data scientists and other technical professions.

  20. plotnine

    A Grammar of Graphics for Python

    Project mention: Plotnine – A Grammar of Graphics for Python | news.ycombinator.com | 2025-11-26
  21. running_page

    Make your own running home page

  22. missingno

    Missing data visualization module for Python.

  23. python-mini-project

    🙌 Welcome open-source Python mini-project contributions!

  24. AWS Data Wrangler

    pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Data Analysis discussion

Log in or Post with

Python Data Analysis related posts

  • Pluto.jl 1.0 release – reactive notebook for Julia

    1 project | news.ycombinator.com | 3 Jun 2026
  • The First LLM Agent Cyberattack: How an AI Hacker Exfiltrated a Database in Under an Hour

    1 project | dev.to | 1 Jun 2026
  • Can you design a Python project like material flow, architect only?

    1 project | news.ycombinator.com | 29 May 2026
  • What Training Exists for Security Professionals Learning AI and Data Science?

    5 projects | dev.to | 23 May 2026
  • Marimo: A Reactive Python Notebook

    1 project | news.ycombinator.com | 23 May 2026
  • 16 Python Libraries You Should Know

    6 projects | dev.to | 21 May 2026
  • Best AI Cybersecurity Training for Security Teams: How to Pick

    5 projects | dev.to | 18 May 2026
  • A note from our sponsor - SaaSHub
    www.saashub.com | 8 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Data Analysis projects in Python? This list will help you:

# Project Stars
1 scikit-learn 66,237
2 TrendRadar 58,992
3 Pandas 48,900
4 streamlit 44,819
5 gradio 42,815
6 BettaFish 41,211
7 best-of-ml-python 23,620
8 pandas-ai 23,569
9 airbyte 21,385
10 marimo 21,294
11 akshare 20,091
12 pygwalker 15,826
13 fg-data-profiling 13,582
14 statsmodels 11,444
15 mlcourse.ai 10,608
16 pyod 9,868
17 imbalanced-learn 7,106
18 knowledge-repo 5,533
19 plotnine 4,582
20 running_page 4,477
21 missingno 4,206
22 python-mini-project 4,203
23 AWS Data Wrangler 4,109

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com