Python data-integration

Open-source Python projects categorized as data-integration

Top 11 Python data-integration Projects

  • Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

  • Project mention: Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions | dev.to | 2024-02-12

    Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.

  • airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

  • Project mention: Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres | news.ycombinator.com | 2023-12-12

    I'l also give a shout-out to Airbyte (https://airbyte.com/), with which I've had some limited success with integrating Salesforce to a local database. The particular pull for Airbyte is that we can self-host the open source version, rather than pay Fivetran a significant sum to do this for us.

    It's an immature tool, so I don't yet know that I can claim we've spent _less_ than Fivetran on the additional engineering and ops time, but it feels like it has potential to do so once stabilized.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • dagster

    An orchestration platform for the development, production, and observation of data assets.

  • Project mention: Experience with Dagster.io? | news.ycombinator.com | 2023-07-25
  • Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

  • Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
  • ingestr

    ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

  • Project mention: FLaNK 04 March 2024 | dev.to | 2024-03-04
  • mara-pipelines

    A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

  • recap

    Work with your web service, database, and streaming schemas in a single format.

  • Project mention: Recap: A python library for describing database tables and serialization formats with minimal type coercion. | /r/dataengineering | 2023-07-12

    The Github Repo: https://github.com/recap-build/recap

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • prism

    Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python. (by runprism)

  • Project mention: Prism: the easiest way to create robust data workflows. Accessible via CLI | /r/coolgithubprojects | 2023-09-21
  • nfcompose

    Build REST APIs/Integrations in minutes instead of hours - NF Compose is a (data) integration platform that allows developers to define REST APIs in seconds instead of hours. Generated REST APIs are backed by postgres and support automatic consumer webhook notifications on data changes out of the box.

  • Project mention: Implementing system-versioned tables in Postgres | news.ycombinator.com | 2024-02-07

    I have implemented this for our tool NF Compose that allows us to build REST APIs without writing a single line of code [0]. I didn't go the route of triggers because we generate database tables automatically and we used to have a crazy versioning scheme that was inspired by data vault and anchor modelling where we stored every change on every attribute as a new record.

    Sounded cool, but in practice it was really slow. The techniques that are usually employed by Data Vault to fix this issue seemed too complex. Over time we moved to an implementation that handles the historization dynamically at runtime by generating SQL queries ourselves [1]. On a sidenote: Generating SQL in python sounds dangerous, but we spent a lot of time on making it secure. We even have a linter that checks that everything is escaped properly whenever we are in dev mode [2]

    [0] https://github.com/neuroforgede/nfcompose/

  • UniFuncNet

    A multi-reference network annotation tool to support omics analysis

  • JDR

    Job Dependency Runner

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python data-integration related posts

Index

What are some of the best open-source data-integration projects in Python? This list will help you:

Project Stars
1 Airflow 34,485
2 airbyte 13,923
3 dagster 10,173
4 Mage 7,001
5 ingestr 2,308
6 mara-pipelines 2,054
7 recap 306
8 prism 79
9 nfcompose 32
10 UniFuncNet 10
11 JDR 3

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com