cuetils
daggy
Our great sponsors
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
cuetils
-
Cue: A new language for data validation
The link is broken.
This one?
https://github.com/hofstadter-io/cuetils
Do you also make the cuetorials? It was of great help to us a few months ago. Thank you for that.
-
ETL Pipelines with Airflow: The Good, the Bad and the Ugly
I got inspired and started this over the weekend to demonstrate what is possible.
https://github.com/hofstadter-io/cuetils
daggy
-
ETL Pipelines with Airflow: The Good, the Bad and the Ugly
Thanks for the feedback. I'll take a look at how Luigi models task state. Right now each TaskExecutor type is responsible for running and reporting on tasks (e.g. the Slurm executor submits jobs and monitors them for completion). I was considering adding a companion "verify" stage for every vertex, which would be a command that ran and verified output. It might be a way to do what I think you're describing above without having to build in a variety of expected outputs into the daggy core. I'll check what Luigi is doing, though.
> resuming a partially failed build
Daggy does this! Right now it will continue running the DAG until every path is completed or all vertices in a processing state (queued, running, retry, error) are in the error state, then the DAG goes to an error state.
It's possible to explicitly set task/vertex states (e.g. mark it complete if the step was manually completed), then change the DAG state to QUEUED, at which point the DAG will resume execution from where it left off. [1] is a unit test that walks through that functionality.
[1] https://gitlab.com/iroddis/daggy/-/blob/master/tests/unit_se...
What are some alternatives?
dbt-expectations - Port(ish) of Great Expectations to dbt test macros
NVTabular - NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
Scio - A Scala API for Apache Beam and Google Cloud Dataflow.
materialize - The data warehouse for operational workloads.
cue - CUE has moved to https://github.com/cue-lang/cue
jsonnet-libs - Grafana Labs' Jsonnet libraries
dhall-lang - Maintainable configuration files
cue - The home of the CUE language! Validate and define text-based and dynamic configuration
jk - Configuration as Code with ECMAScript