cool
mara-pipelines
cool | mara-pipelines | |
---|---|---|
1 | 3 | |
116 | 2,056 | |
- | 0.3% | |
1.1 | 6.0 | |
about 1 year ago | 5 months ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
cool
mara-pipelines
-
How to keep track of the different Transformations done in an ETL pipeline?
The closest I've found is Mara but not what I'm after.
-
Using PostgreSQL as a Data Warehouse
The tooling behind the approach has been built as a set of python package named Mara. It is available at GitHub:
https://github.com/mara/mara-pipelines
And additional packages can be found at the Mara org:
https://github.com/mara
-
Build your own “data lake” for reporting purposes
Minio and nifi, require machines by themselves. Better off pure python and if obe wants sonething lighweight and visually pleasing Mara [0] or Dagster with Dagit [1] will do the job
[0] https://github.com/mara/mara-pipelines
[1] https://docs.dagster.io/tutorial/execute
What are some alternatives?
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
abcd-hcp-pipeline - bids application for processing functional MRI data, robust to scanner, acquisition and age variability.
PyFunctional - Python library for creating data pipelines with chain functional programming
kuwala - Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
papermill - 📚 Parameterize, execute, and analyze notebooks
pybaseball - Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
etl-markup-toolkit - ETL Markup Toolkit is a spark-native tool for expressing ETL transformations as configuration
dremio-oss - Dremio - the missing link in modern data
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
citus - Distributed PostgreSQL as an extension
sgr - sgr (command line client for Splitgraph) and the splitgraph Python library