SaaSHub helps you find the best software and product alternatives Learn more →
Top 12 Python data-integration Projects
-
This article demonstrates how to work with near real-time and historical data using the dlt package. Whether you need to scale data access across the enterprise or provide historical data for post-event analysis, you can use the same framework to provide customer data. In a future article, I'll demonstrate how to use dlt with a workflow orchestrator such as Apache Airflow or Dagster.``
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Project mention: Build a Stock Dashboard in less than 40 lines of Python code!🤓 | dev.to | 2024-12-05
Star ⭐ Taipy repo
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
With the transition from ETL to ELT, data warehouses have ascended to the role of data custodians, centralizing customer data collected from fragmented systems. This pivotal shift has been enabled by a suite of powerful tools: Fivetran and Airbyte streamline the extraction and loading, DBT handles the transformation, and robust warehousing solutions like Snowflake and Redshift store the data. While traditionally these technologies catered to analytical and business intelligence applications (think Looker and Superset), there's an increasing recognition of their potential for more dynamic operational analytics, delivering real-time data for actionable insights.
-
This article demonstrates how to work with near real-time and historical data using the dlt package. Whether you need to scale data access across the enterprise or provide historical data for post-event analysis, you can use the same framework to provide customer data. In a future article, I'll demonstrate how to use dlt with a workflow orchestrator such as Apache Airflow or Dagster.``
-
Mage
🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
Mage AI is a data transforming and integrating framework that allows data scientists and ML engineers to build and automate data pipelines without extensive coding. Data scientists can easily connect to their data sources, ingest data, and build production-ready data pipelines within Mage notebooks.
-
-
mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
-
-
prism
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python. (by runprism)
-
nfcompose
Build REST APIs/Integrations in minutes instead of hours - NF Compose is a (data) integration platform that allows developers to define REST APIs in seconds instead of hours. Generated REST APIs are backed by postgres and support automatic consumer webhook notifications on data changes out of the box.
Project mention: Implementing system-versioned tables in Postgres | news.ycombinator.com | 2024-02-07I have implemented this for our tool NF Compose that allows us to build REST APIs without writing a single line of code [0]. I didn't go the route of triggers because we generate database tables automatically and we used to have a crazy versioning scheme that was inspired by data vault and anchor modelling where we stored every change on every attribute as a new record.
Sounded cool, but in practice it was really slow. The techniques that are usually employed by Data Vault to fix this issue seemed too complex. Over time we moved to an implementation that handles the historization dynamically at runtime by generating SQL queries ourselves [1]. On a sidenote: Generating SQL in python sounds dangerous, but we spent a lot of time on making it secure. We even have a linter that checks that everything is escaped properly whenever we are in dev mode [2]
[0] https://github.com/neuroforgede/nfcompose/
-
-
Python data-integration discussion
Python data-integration related posts
-
Data Engineering with DLT and REST
-
Ingestr: CLI tool to copy data between any databases with a single command
-
Show HN: Retake – Open-Source Hybrid Search for Postgres
-
We created an open-source semantic search Python package on top of Postgres
-
Mage Battlegrounds: Craft insights from real-time customer behavior analysis
-
JDR Tool Introduction (Job Dependency Runner)
-
Looking for an open-source project
-
A note from our sponsor - SaaSHub
www.saashub.com | 6 Dec 2024