SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Data Analysis Projects
-
certutil.exe or notepad.exe opening an external connection lands in rare because, fleet-wide, those processes almost never egress. Tune the <= 3 threshold to your environment size. For a more principled version, score each (process, destination) pair by frequency and treat the long tail as the hunt queue, which is the same idea behind scikit-learn's rarity-based anomaly methods without the model overhead.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
TrendRadar
⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
Link: https://github.com/sansan0/TrendRadar
-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Feature transformations should be deterministic: The same input should produce the same output when the same feature definition and configuration are applied. This is what allows training, backtesting, and live inference to remain aligned. Tools such as Pandas, Spark, or feature platforms such as Feast can be used to implement that logic.
-
Streamlit
-
Gradio is an open source Python package that allows you to create web-based interfaces for AI models, APIs, or any Python function. Its simplicity and flexibility make it a popular choice among developers who want to quickly prototype and deploy web-based interfaces without worrying about frontend development.
-
Project mention: BettaFish – Public Opinion Sentiment Analysis Model | news.ycombinator.com | 2025-11-03
-
-
pandas-ai
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
Pandas-AI: Talk to your dataframes in natural language.
-
airbyte
Open-source data movement for ELT pipelines and AI agents — from APIs, databases & files to warehouses, lakes, and AI applications. Both self-hosted and Cloud.
Project mention: Show HN: Airbyte Agents – context for agents across multiple data sources | news.ycombinator.com | 2026-05-05 -
marimo
A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor.
Project mention: Pluto.jl 1.0 release – reactive notebook for Julia | news.ycombinator.com | 2026-06-03Pluto is great. I use it all the time. If you like the reactivity/reproducibility but are wedded to Python, you might want to check out Marimo, which is also great. [https://marimo.io/]
It too puts the output of a cell above the code so if you're unable to adapt to things that are different it's also probably not for you.
FWIW, Observable's Notebooks (Javascript) work the same way: output above the code that produces it. [https://observablehq.com/]
I too did not like having the output above the code but got over it pretty quickly. For plots, it's arguably better: usually, I want to see the plot before I see the 15 line invocation of some plot command. The thing that bugs me the most about Pluto now is that it really wants you to only have a single evaluating statement per cell. You have to wrap stuff in "block......end" if you want to e.g. define more than one variable in a cell.
-
-
fg-data-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
-
-
pyod
A Python library for anomaly detection across tabular, time series, graph, text, and image data. 60+ detectors, benchmark-backed ADEngine orchestration, and an agentic workflow for AI agents.
-
-
knowledge-repo
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
-
-
-
-
-
AWS Data Wrangler
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Python Data Analysis discussion
Python Data Analysis related posts
-
Pluto.jl 1.0 release – reactive notebook for Julia
-
The First LLM Agent Cyberattack: How an AI Hacker Exfiltrated a Database in Under an Hour
-
Can you design a Python project like material flow, architect only?
-
What Training Exists for Security Professionals Learning AI and Data Science?
-
Marimo: A Reactive Python Notebook
-
16 Python Libraries You Should Know
-
Best AI Cybersecurity Training for Security Teams: How to Pick
-
A note from our sponsor - SaaSHub
www.saashub.com | 8 Jun 2026
Index
What are some of the best open-source Data Analysis projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | scikit-learn | 66,237 |
| 2 | TrendRadar | 58,992 |
| 3 | Pandas | 48,900 |
| 4 | streamlit | 44,819 |
| 5 | gradio | 42,815 |
| 6 | BettaFish | 41,211 |
| 7 | best-of-ml-python | 23,620 |
| 8 | pandas-ai | 23,569 |
| 9 | airbyte | 21,385 |
| 10 | marimo | 21,294 |
| 11 | akshare | 20,091 |
| 12 | pygwalker | 15,826 |
| 13 | fg-data-profiling | 13,582 |
| 14 | statsmodels | 11,444 |
| 15 | mlcourse.ai | 10,608 |
| 16 | pyod | 9,868 |
| 17 | imbalanced-learn | 7,106 |
| 18 | knowledge-repo | 5,533 |
| 19 | plotnine | 4,582 |
| 20 | running_page | 4,477 |
| 21 | missingno | 4,206 |
| 22 | python-mini-project | 4,203 |
| 23 | AWS Data Wrangler | 4,109 |