mara-pipelines
dbt
Our great sponsors
mara-pipelines | dbt | |
---|---|---|
3 | 1 | |
2,054 | 3,802 | |
0.4% | - | |
6.0 | 10.0 | |
5 months ago | over 2 years ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mara-pipelines
-
How to keep track of the different Transformations done in an ETL pipeline?
The closest I've found is Mara but not what I'm after.
-
Using PostgreSQL as a Data Warehouse
The tooling behind the approach has been built as a set of python package named Mara. It is available at GitHub:
https://github.com/mara/mara-pipelines
And additional packages can be found at the Mara org:
https://github.com/mara
-
Build your own “data lake” for reporting purposes
Minio and nifi, require machines by themselves. Better off pure python and if obe wants sonething lighweight and visually pleasing Mara [0] or Dagster with Dagit [1] will do the job
[0] https://github.com/mara/mara-pipelines
[1] https://docs.dagster.io/tutorial/execute
dbt
-
Open Source Analytics Stack: Bringing Control, Flexibility, and Data-Privacy to Your Analytics
Due to the rise in cloud-based data warehouses, businesses can directly load all the raw data into the data warehouse without prior transformations. This process is known as ELT (Extract, Load, Transform) and gives data and analytics teams freedom to develop ad-hoc transformations based on their particular needs. ELT became popular as the cloud's processing power and scale became better suited to transforming data. DBT (website, GitHub) is a popular open-source tool recommended for ELT and allows businesses to transform data in their warehouses more effectively. It's a great pairing with with RudderStack's Cloud Extract ETL tool.
What are some alternatives?
abcd-hcp-pipeline - bids application for processing functional MRI data, robust to scanner, acquisition and age variability.
Apache Kafka - Mirror of Apache Kafka
kuwala - Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
pybaseball - Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
superset - Apache Superset is a Data Visualization and Data Exploration Platform
dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Snowplow - The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
etl-markup-toolkit - ETL Markup Toolkit is a spark-native tool for expressing ETL transformations as configuration
nbdev - Create delightful software with Jupyter Notebooks
dremio-oss - Dremio - the missing link in modern data
rudderstack-docs - Documentation repository for RudderStack - the Customer Data Platform for Developers.