Python data-pipelines

Open-source Python projects categorized as data-pipelines

Top 10 Python data-pipeline Projects

data-pipelines
  • Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    Project mention: How I've implemented the Medallion architecture using Apache Spark and Apache Hdoop | dev.to | 2024-06-17

    Instead of the custom orchestrator I used, a proper orchestration tool should replace it like Apache Airflow, Dagster, ..., etc.

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • dagster

    An orchestration platform for the development, production, and observation of data assets.

    Project mention: How I've implemented the Medallion architecture using Apache Spark and Apache Hdoop | dev.to | 2024-06-17

    Instead of the custom orchestrator I used, a proper orchestration tool should replace it like Apache Airflow, Dagster, ..., etc.

  • ragflow

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

    Project mention: Agentic RAG: Definition and Low-Code Implementation | news.ycombinator.com | 2024-06-19

    From 0.8, RAGFlow(https://github.com/infiniflow/ragflow) will provide no code workflow orchestration. This article describes what kind of graph orchestration engine is needed, and how it can be used to implement Agentic RAG.

  • Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

    Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
  • meltano

    Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

  • versatile-data-kit

    One framework to develop, deploy and operate data workflows with Python and SQL.

  • dbt-data-reliability

    dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • recap

    Work with your web service, database, and streaming schemas in a single format.

    Project mention: Recap: A python library for describing database tables and serialization formats with minimal type coercion. | /r/dataengineering | 2023-07-12

    The Github Repo: https://github.com/recap-build/recap

  • patterns-devkit

    Data pipelines from re-usable components

  • SmartPipeline

    A framework for rapid development of robust data pipelines following a simple design pattern

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python data-pipelines discussion

Log in or Post with

Python data-pipelines related posts

  • AI Strategy Guide: How to Scale AI Across Your Business

    4 projects | dev.to | 11 May 2024
  • Experience with Dagster.io?

    1 project | news.ycombinator.com | 25 Jul 2023
  • Dagster tutorials

    1 project | /r/dataengineering | 26 Jun 2023
  • The Dagster Master Plan

    2 projects | /r/dataengineering | 16 Jun 2023
  • The Why and How of Dagster User Code Deployment Automation

    1 project | dev.to | 1 May 2023
  • Mage Battlegrounds: Craft insights from real-time customer behavior analysis

    2 projects | dev.to | 10 Apr 2023
  • Looking for an open-source project

    2 projects | /r/dataengineering | 13 Feb 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 24 Jun 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source data-pipeline projects in Python? This list will help you:

Project Stars
1 Airflow 35,109
2 dagster 10,613
3 ragflow 9,740
4 Mage 7,321
5 meltano 1,652
6 versatile-data-kit 415
7 dbt-data-reliability 355
8 recap 310
9 patterns-devkit 106
10 SmartPipeline 23

Sponsored
Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com