InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 Python Data Projects
-
Resource: LlamaIndex Documentation
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
pandas-ai
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
View the Project on GitHub
-
Project mention: Show HN: Flow – A Dynamic Task Engine for AI Agents Without DAG | news.ycombinator.com | 2024-12-02
- https://github.com/PrefectHQ/prefect
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
-
Mage
🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
Here, we use the free Mage Ai orchestration tool.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
knowledge-repo
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
-
Project mention: Build fully portable AI applications on top of Snowflake with SuperDuperDB | dev.to | 2024-06-26
Customize how AI and databases work together. Scale your AI projects to handle more data and users. Move AI projects between different environments easily. Extend the system with new AI features and database functionality. Check it out: Blog: https://blog.superduperdb.com/version-02 Github: https://github.com/SuperDuperDB/superduperdb (leave us a star ⭐️🥳)
-
CKAN
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
-
Mimesis
Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
View the Project on GitHub
-
-
cognita
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
Project mention: Lists of open-source frameworks for building RAG applications | dev.to | 2025-01-02Ideal For: Enterprises seeking a robust framework for large-scale AI applications. GitHub Repository
-
-
-
preswald
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, DuckDB, Pandas, and Plotly, Matplotlib, etc. Build dashboards, reports, and notebooks that run offline, load fast, and share like a document.
Project mention: Revolutionizing Data Apps: Build Interactive Dashboards with Just Python! | dev.to | 2025-03-19View the Project on GitHub
-
-
PyPika
PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
Project mention: FastAPI, Pydantic, Psycopg3: the holy trinity for Python web APIs | dev.to | 2024-10-24PyPika: I don't know much about this one.
-
-
Project mention: How much oranger do red orange bags make oranges look? | news.ycombinator.com | 2025-04-14
There are also color science packages like this one that let you do conversions to various spaces - https://www.colour-science.org/
-
-
mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
-
meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Data discussion
Python Data related posts
-
Mimesis: The Fake Data Generator That Will Blow Your Mind!
-
Personal Picks: Data Product News (April 16, 2025)
-
airbyte VS cocoindex - a user suggested alternative
2 projects | 1 Apr 2025 -
Automate structured data extraction from PDF / Word by OpenAI and CocoIndex
-
Revolutionizing Data Apps: Build Interactive Dashboards with Just Python!
-
Glom – Restructuring data, the Python way
-
Quick tip: Replace MongoDB® Atlas with SingleStore Kai in LlamaIndex
-
A note from our sponsor - InfluxDB
www.influxdata.com | 17 May 2025
Index
What are some of the best open-source Data projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | llama_index | 41,591 |
2 | pandas-ai | 20,042 |
3 | Prefect | 19,241 |
4 | airbyte | 18,103 |
5 | akshare | 11,722 |
6 | chinese-xinhua | 11,112 |
7 | Mage | 8,312 |
8 | knowledge-repo | 5,518 |
9 | superduper | 5,058 |
10 | CKAN | 4,703 |
11 | Mimesis | 4,566 |
12 | datasets | 4,409 |
13 | cognita | 4,053 |
14 | dlt | 3,615 |
15 | TextRecognitionDataGenerator | 3,469 |
16 | preswald | 3,648 |
17 | pandas-datareader | 3,031 |
18 | PyPika | 2,673 |
19 | PyFunctional | 2,447 |
20 | Colour | 2,269 |
21 | sketch | 2,257 |
22 | mara-pipelines | 2,082 |
23 | meltano | 2,054 |