-
hamilton
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Yeah! So we actually have an integration with polars. See https://github.com/DAGWorks-Inc/hamilton/blob/5c8e564d19ff23....
To be clear, the specific paradigm we're referring to is this way of writing transforms as functions where the parameter name is the upstream dependency -- not the notion of delayed execution.
I think there are two different concepts here though:
1. How the transforms are executed
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
allowing users to write imperative code (e.g. using loops) that dynamically generates DAGs are never a good idea. I say this as someone who personally used to pester framework PMs for this exact feature before. While things like task groups (formerly subDAGs) [2] appear initially to be right answer, I always ended up regretting them. They're a scheduling/orchestration solution to a data transformation problem
Can y'all speak to how Hamilton views the data and control plane, and how it's design philosophy encourages users to use the right tool for the job?
p.s. thanks for humoring my pedantry and merging this! [3]
[1]: https://github.com/dbt-labs/docs.getdbt.com/pull/2390
-
would love to collaborate on an integration with pyquokka (https://github.com/marsupialtail/quokka) once I put out a stable release end of this month :-)
-