The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 10 Python data-pipeline Projects
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Mage
🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
-
ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
-
meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
-
dbt-data-reliability
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SmartPipeline
A framework for rapid development of robust data pipelines following a simple design pattern
Project mention: Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions | dev.to | 2024-02-12Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.
Project mention: RAGFlow is an open-source RAG engine based on deep document understanding | news.ycombinator.com | 2024-04-01Just link them to https://github.com/infiniflow/ragflow/blob/main/rag/llm/chat... :)
Project mention: meltano VS cloudquery - a user suggested alternative | libhunt.com/r/meltano | 2023-06-02
Here's the project: https://github.com/vmware/versatile-data-kit
Project mention: Recap: A python library for describing database tables and serialization formats with minimal type coercion. | /r/dataengineering | 2023-07-12The Github Repo: https://github.com/recap-build/recap
Project mention: Show HN: SmartPipeline, robust and light data pipelines in Python | news.ycombinator.com | 2023-05-03
Python data-pipelines related posts
- Experience with Dagster.io?
- Dagster tutorials
- The Dagster Master Plan
- The Why and How of Dagster User Code Deployment Automation
- Mage Battlegrounds: Craft insights from real-time customer behavior analysis
- Looking for an open-source project
- Best Orchestration Tool to run dbt projects?
-
A note from our sponsor - WorkOS
workos.com | 24 Apr 2024
Index
What are some of the best open-source data-pipeline projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Airflow | 34,397 |
2 | dagster | 10,173 |
3 | Mage | 7,001 |
4 | ragflow | 5,516 |
5 | meltano | 1,587 |
6 | versatile-data-kit | 410 |
7 | dbt-data-reliability | 338 |
8 | recap | 306 |
9 | patterns-devkit | 106 |
10 | SmartPipeline | 22 |
Sponsored