Launch HN: DAGWorks – ML platform for data science teams

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. hamilton

    Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

    Yeah! So we actually have an integration with polars. See https://github.com/DAGWorks-Inc/hamilton/blob/5c8e564d19ff23....

    To be clear, the specific paradigm we're referring to is this way of writing transforms as functions where the parameter name is the upstream dependency -- not the notion of delayed execution.

    I think there are two different concepts here though:

    1. How the transforms are executed

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. docs.getdbt.com

    The code behind docs.getdbt.comrepo[description]

    allowing users to write imperative code (e.g. using loops) that dynamically generates DAGs are never a good idea. I say this as someone who personally used to pester framework PMs for this exact feature before. While things like task groups (formerly subDAGs) [2] appear initially to be right answer, I always ended up regretting them. They're a scheduling/orchestration solution to a data transformation problem

    Can y'all speak to how Hamilton views the data and control plane, and how it's design philosophy encourages users to use the right tool for the job?

    p.s. thanks for humoring my pedantry and merging this! [3]

    [1]: https://github.com/dbt-labs/docs.getdbt.com/pull/2390

  4. quokka

    Making data lake work for time series (by marsupialtail)

    would love to collaborate on an integration with pyquokka (https://github.com/marsupialtail/quokka) once I put out a stable release end of this month :-)

  5. awesome-pipeline

    A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Data Orchestration Tool Analysis: Airflow, Dagster, Flyte

    3 projects | dev.to | 23 Jan 2025
  • Ask HN: What's the right tool for this job?

    4 projects | news.ycombinator.com | 20 Jul 2024
  • AI Strategy Guide: How to Scale AI Across Your Business

    4 projects | dev.to | 11 May 2024
  • Show HN: Hamilton's UI – observability, lineage, and catalog for data pipelines

    1 project | news.ycombinator.com | 2 May 2024
  • Prefect: A workflow orchestration tool for data pipelines

    1 project | news.ycombinator.com | 13 Mar 2024

Did you know that Python is
the 2nd most popular programming language
based on number of references?