Python data-pipeline

Open-source Python projects categorized as data-pipeline

Top 15 Python data-pipeline Projects

data-pipeline
  • airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

    Project mention: From ETL and ELT to Reverse ETL | dev.to | 2024-10-15

    With the transition from ETL to ELT, data warehouses have ascended to the role of data custodians, centralizing customer data collected from fragmented systems. This pivotal shift has been enabled by a suite of powerful tools: Fivetran and Airbyte streamline the extraction and loading, DBT handles the transformation, and robust warehousing solutions like Snowflake and Redshift store the data. While traditionally these technologies catered to analytical and business intelligence applications (think Looker and Superset), there's an increasing recognition of their potential for more dynamic operational analytics, delivering real-time data for actionable insights.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • ingestr

    ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

    Project mention: FLaNK 04 March 2024 | dev.to | 2024-03-04
  • doit

    CLI task management & automation tool

    Project mention: How do you deal with CI, project config, etc. falling out of sync across repos? | /r/ExperiencedDevs | 2023-12-06

    I like mage for Go and doit for Python.

  • DataEngineeringProject

    Example end to end data engineering project.

  • covalent

    Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments. (by AgnostiqHQ)

    Project mention: Remote execution of code | /r/Python | 2023-12-05

    Pretty interesting request, if SSH is not used, i would try using something like dask which uses tcp to connect and execute assuming your workers are in another machine.I also think something like covalent can be used to extend your own custom plugin in their ecosystem to connect how you want. We have a very custom private plugin written on top of covalent's to have a custom protocol to connect our central on-prem GPU machines to our local laptops that is rpc based, mostly for high performance as well as some mandate security from where the GPU machines are. Once done it is pretty much something like

  • piperider

    Code review for data in dbt

  • tributary

    Streaming reactive and dataflow graphs in Python

  • VQASynth

    Compose multimodal datasets 🎹

    Project mention: Show HN: VQASynth – pipelines to synthesize VQA datasets | news.ycombinator.com | 2024-02-23
  • datajob

    Build and deploy a serverless data pipeline on AWS with no effort.

  • patterns-devkit

    Data pipelines from re-usable components

  • airflow-testing-ci-workflow

    (project & tutorial) dag pipeline tests + ci/cd setup

  • alto

    Alto is a versatile data integration tool that allows you to easily run Singer plugins, build and cache PEX files encapsulating those plugins, and create a data reservoir whereby you can extract once and replay to as many destinations as you want. (by z3z1ma)

  • datatap-python

    Focus on Algorithm Design, Not on Data Wrangling

  • pyDag

    Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag

  • data-engineer-challenge

    Challenge Data Engineer

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python data-pipeline discussion

Log in or Post with

Python data-pipeline related posts

Index

What are some of the best open-source data-pipeline projects in Python? This list will help you:

Project Stars
1 airbyte 16,331
2 ingestr 2,576
3 doit 1,876
4 DataEngineeringProject 1,094
5 covalent 784
6 piperider 482
7 tributary 441
8 VQASynth 217
9 datajob 110
10 patterns-devkit 106
11 airflow-testing-ci-workflow 85
12 alto 55
13 datatap-python 34
14 pyDag 25
15 data-engineer-challenge 25

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you konow that Python is
the 1st most popular programming language
based on number of metions?