SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Data Analysis Projects
-
Project mention: 🚀 Launching a High-Performance DistilBERT-Based Sentiment Analysis Model for Steam Reviews 🎮🤖 | dev.to | 2024-12-16
scikit-learn (optional): Useful for additional training or evaluation tasks.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Project mention: Building a Sarcasm Detection System with LSTM and GloVe: A Complete Guide | dev.to | 2025-01-02Pandas
-
Project mention: Building a Sarcasm Detection System with LSTM and GloVe: A Complete Guide | dev.to | 2025-01-02
Streamlit
-
Project mention: Show HN: I made a website to semantically search ArXiv papers | news.ycombinator.com | 2024-12-24
-
Best of ml python
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Let’s say I’m using Cursor to build a bunch of data apps and using Airbyte as the data movement platform and Streamlit for the frontend. I’m writing in Python and using the Airbyte API libraries. This is my basic ‘tech stack’.
-
pandas-ai
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
In this blog, we will build a powerful IDE agent for PandasAI using Dash Agent. Then later on, we'll understand how using RAG can significantly improve LLM responses.
-
Project mention: A simple way to explore data through a Tableau-like UI directly in your data app | news.ycombinator.com | 2024-12-30
I believe this is just a wrapper around pygwalker, which is a nice project: https://github.com/Kanaries/pygwalker
I really like the typescript graphic walker: https://github.com/Kanaries/graphic-walker
-
ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
statsmodels is the closest thing in python to R. statsmodels has mixed model support, but mgcv apparently requires more. It is well above my paygrade, but this seems relevant: https://github.com/statsmodels/statsmodels/issues/8029 (i.e. no out of the box support, you might be able to build an approximation on your own).
-
-
marimo
A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.
Project mention: Show HN: WASM-powered codespaces for Python notebooks on GitHub | news.ycombinator.com | 2025-01-14 -
pyod
A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
-
-
knowledge-repo
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
-
Project mention: TaskWeaver: Code-first agent framework for seamlessly planning | news.ycombinator.com | 2024-03-10
-
-
-
AWS Data Wrangler
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
-
-
-
igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
Python Data Analysis discussion
Python Data Analysis related posts
-
Minimal Rio Intro
-
Building a Sarcasm Detection System with LSTM and GloVe: A Complete Guide
-
A simple way to explore data through a Tableau-like UI directly in your data app
-
Fixing timestamp overflow error in Python
-
I built a data pipeline tool in Go
-
Can AI finally generate best practice code? I think so.
-
Show HN: I built an open-source data pipeline tool in Go
-
A note from our sponsor - SaaSHub
www.saashub.com | 19 Jan 2025
Index
What are some of the best open-source Data Analysis projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | scikit-learn | 60,790 |
2 | Pandas | 44,267 |
3 | streamlit | 36,771 |
4 | gradio | 35,248 |
5 | best-of-ml-python | 18,766 |
6 | airbyte | 16,935 |
7 | pandas-ai | 13,970 |
8 | pygwalker | 13,701 |
9 | ydata-profiling | 12,652 |
10 | statsmodels | 10,349 |
11 | akshare | 10,038 |
12 | mlcourse.ai | 9,862 |
13 | marimo | 9,504 |
14 | pyod | 8,748 |
15 | imbalanced-learn | 6,892 |
16 | knowledge-repo | 5,499 |
17 | TaskWeaver | 5,457 |
18 | plotnine | 4,105 |
19 | missingno | 3,999 |
20 | AWS Data Wrangler | 3,965 |
21 | running_page | 3,721 |
22 | python-mini-project | 3,250 |
23 | igel | 3,096 |