Python data-integration

Open-source Python projects categorized as data-integration

Top 12 Python data-integration Projects

data-integration
  • Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    Project mention: Data Engineering with DLT and REST | dev.to | 2024-11-28

    This article demonstrates how to work with near real-time and historical data using the dlt package. Whether you need to scale data access across the enterprise or provide historical data for post-event analysis, you can use the same framework to provide customer data. In a future article, I'll demonstrate how to use dlt with a workflow orchestrator such as Apache Airflow or Dagster.``

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • Taipy

    Turns Data and AI algorithms into production-ready web applications in no time.

    Project mention: Build a Stock Dashboard in less than 40 lines of Python code!🤓 | dev.to | 2024-12-05

    Star ⭐ Taipy repo

  • airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

    Project mention: From ETL and ELT to Reverse ETL | dev.to | 2024-10-15

    With the transition from ETL to ELT, data warehouses have ascended to the role of data custodians, centralizing customer data collected from fragmented systems. This pivotal shift has been enabled by a suite of powerful tools: Fivetran and Airbyte streamline the extraction and loading, DBT handles the transformation, and robust warehousing solutions like Snowflake and Redshift store the data. While traditionally these technologies catered to analytical and business intelligence applications (think Looker and Superset), there's an increasing recognition of their potential for more dynamic operational analytics, delivering real-time data for actionable insights.

  • dagster

    An orchestration platform for the development, production, and observation of data assets.

    Project mention: Data Engineering with DLT and REST | dev.to | 2024-11-28

    This article demonstrates how to work with near real-time and historical data using the dlt package. Whether you need to scale data access across the enterprise or provide historical data for post-event analysis, you can use the same framework to provide customer data. In a future article, I'll demonstrate how to use dlt with a workflow orchestrator such as Apache Airflow or Dagster.``

  • Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

    Project mention: 25 Open Source AI Tools to Cut Your Development Time in Half | dev.to | 2024-07-11

    Mage AI is a data transforming and integrating framework that allows data scientists and ML engineers to build and automate data pipelines without extensive coding. Data scientists can easily connect to their data sources, ingest data, and build production-ready data pipelines within Mage notebooks.

  • ingestr

    ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

    Project mention: FLaNK 04 March 2024 | dev.to | 2024-03-04
  • mara-pipelines

    A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

  • recap

    Work with your web service, database, and streaming schemas in a single format.

  • prism

    Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python. (by runprism)

  • nfcompose

    Build REST APIs/Integrations in minutes instead of hours - NF Compose is a (data) integration platform that allows developers to define REST APIs in seconds instead of hours. Generated REST APIs are backed by postgres and support automatic consumer webhook notifications on data changes out of the box.

    Project mention: Implementing system-versioned tables in Postgres | news.ycombinator.com | 2024-02-07

    I have implemented this for our tool NF Compose that allows us to build REST APIs without writing a single line of code [0]. I didn't go the route of triggers because we generate database tables automatically and we used to have a crazy versioning scheme that was inspired by data vault and anchor modelling where we stored every change on every attribute as a new record.

    Sounded cool, but in practice it was really slow. The techniques that are usually employed by Data Vault to fix this issue seemed too complex. Over time we moved to an implementation that handles the historization dynamically at runtime by generating SQL queries ourselves [1]. On a sidenote: Generating SQL in python sounds dangerous, but we spent a lot of time on making it secure. We even have a linter that checks that everything is escaped properly whenever we are in dev mode [2]

    [0] https://github.com/neuroforgede/nfcompose/

  • UniFuncNet

    A multi-reference network annotation tool to support omics analysis

  • JDR

    Job Dependency Runner

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python data-integration discussion

Log in or Post with

Python data-integration related posts

  • Data Engineering with DLT and REST

    2 projects | dev.to | 28 Nov 2024
  • Ingestr: CLI tool to copy data between any databases with a single command

    1 project | news.ycombinator.com | 27 Feb 2024
  • Show HN: Retake – Open-Source Hybrid Search for Postgres

    2 projects | news.ycombinator.com | 10 Aug 2023
  • We created an open-source semantic search Python package on top of Postgres

    1 project | /r/Python | 31 Jul 2023
  • Mage Battlegrounds: Craft insights from real-time customer behavior analysis

    2 projects | dev.to | 10 Apr 2023
  • JDR Tool Introduction (Job Dependency Runner)

    1 project | /r/madeinpython | 19 Mar 2023
  • Looking for an open-source project

    2 projects | /r/dataengineering | 13 Feb 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 6 Dec 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source data-integration projects in Python? This list will help you:

Project Stars
1 Airflow 37,485
2 Taipy 16,991
3 airbyte 16,331
4 dagster 11,961
5 Mage 8,003
6 ingestr 2,576
7 mara-pipelines 2,081
8 recap 334
9 prism 82
10 nfcompose 35
11 UniFuncNet 12
12 JDR 3

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you konow that Python is
the 2nd most popular programming language
based on number of metions?