Ask HN: How do your ML teams version datasets and models?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • dvc

    🦉 ML Experiments and Data Management with Git

  • dud

    A lightweight CLI tool for versioning data alongside source code and building data pipelines.

  • I've used DVC in the past and generally liked its approach. That said, I wholeheartedly agree that it's clunky. It does a lot of things implicitly, which can make it hard to reason about. It was also extremely slow for medium-sized dataset (low 10s of GBs).

    In response, I created a command-line tool that addresses these issues[0]. To reduce the comparison to an analogy: Dud : DVC :: Flask : Django.

    [0]: https://github.com/kevin-hanselman/dud

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts