SaaSHub helps you find the best software and product alternatives Learn more →
Top 11 Python data-integration Projects
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Mage
🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
-
mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
prism
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python. (by runprism)
-
nfcompose
Build REST APIs/Integrations in minutes instead of hours - NF Compose is a (data) integration platform that allows developers to define REST APIs in seconds instead of hours. Generated REST APIs are backed by postgres and support automatic consumer webhook notifications on data changes out of the box.
Project mention: Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions | dev.to | 2024-02-12Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.
Project mention: Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres | news.ycombinator.com | 2023-12-12I'l also give a shout-out to Airbyte (https://airbyte.com/), with which I've had some limited success with integrating Salesforce to a local database. The particular pull for Airbyte is that we can self-host the open source version, rather than pay Fivetran a significant sum to do this for us.
It's an immature tool, so I don't yet know that I can claim we've spent _less_ than Fivetran on the additional engineering and ops time, but it feels like it has potential to do so once stabilized.
Project mention: Recap: A python library for describing database tables and serialization formats with minimal type coercion. | /r/dataengineering | 2023-07-12The Github Repo: https://github.com/recap-build/recap
Project mention: Prism: the easiest way to create robust data workflows. Accessible via CLI | /r/coolgithubprojects | 2023-09-21
Project mention: Implementing system-versioned tables in Postgres | news.ycombinator.com | 2024-02-07I have implemented this for our tool NF Compose that allows us to build REST APIs without writing a single line of code [0]. I didn't go the route of triggers because we generate database tables automatically and we used to have a crazy versioning scheme that was inspired by data vault and anchor modelling where we stored every change on every attribute as a new record.
Sounded cool, but in practice it was really slow. The techniques that are usually employed by Data Vault to fix this issue seemed too complex. Over time we moved to an implementation that handles the historization dynamically at runtime by generating SQL queries ourselves [1]. On a sidenote: Generating SQL in python sounds dangerous, but we spent a lot of time on making it secure. We even have a linter that checks that everything is escaped properly whenever we are in dev mode [2]
[0] https://github.com/neuroforgede/nfcompose/
Python data-integration related posts
- Ingestr: CLI tool to copy data between any databases with a single command
- Show HN: Retake – Open-Source Hybrid Search for Postgres
- We created an open-source semantic search Python package on top of Postgres
- Mage Battlegrounds: Craft insights from real-time customer behavior analysis
- JDR Tool Introduction (Job Dependency Runner)
- Looking for an open-source project
- Daskqueue: Dask-based distributed task queue
-
A note from our sponsor - SaaSHub
www.saashub.com | 26 Apr 2024
Index
What are some of the best open-source data-integration projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Airflow | 34,485 |
2 | airbyte | 13,923 |
3 | dagster | 10,173 |
4 | Mage | 7,001 |
5 | ingestr | 2,308 |
6 | mara-pipelines | 2,054 |
7 | recap | 306 |
8 | prism | 79 |
9 | nfcompose | 32 |
10 | UniFuncNet | 10 |
11 | JDR | 3 |
Sponsored