🦉 Data Version Control | Git for Data & Models | ML Experiments Management (by iterative)

Dvc Alternatives

Similar projects and alternatives to dvc

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better dvc alternative or higher similarity.

dvc reviews and mentions

Posts with mentions or reviews of dvc. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-28.
  • Ask HN: How do your ML teams version datasets and models?
    3 projects | news.ycombinator.com | 28 Sep 2023
  • Exploring MLOps Tools and Frameworks: Enhancing Machine Learning Operations
    3 projects | dev.to | 6 Jun 2023
    DVC (Data Version Control):
  • Evaluate and Track Your LLM Experiments: Introducing TruLens for LLMs
    2 projects | news.ycombinator.com | 24 May 2023
  • [D] Is there a tool to keep track of my ML experiments?
    2 projects | /r/MachineLearning | 13 May 2023
    I have been using DVC and MLflow since then DVC had only data tracking and MLflow only model tracking. I can say both are awesome now and maybe the only factor I would like to mention is that IMO, MLflow is a bit harder to learn while DVC is just a git practically.
  • Ask HN: Data Management for AI Training
    3 projects | news.ycombinator.com | 30 Apr 2023
    * User interface for less tech savy people ( e.g just a git like command line is fine for engineers but not for field personell who are not in IT )

    I know of tools like https://dvc.org/ but a) they are just layers on top of git b) break appart on huge datasets without a folder hierarchy ( git tree objects just don't work for linear lists of items ) are only useable by IT personell, and require checking out at least a part of the dataset.

    Our datasets would be 100.000.000 x 100 MB = 10 PB of raw data. Training data should be delivered to training nodes via network etc.. we just can't have a full checkout of that data...

  • Do you wonder why MLOps is not at the same level as DevOps?
    2 projects | /r/MLQuestions | 31 Mar 2023
    Hey, great find! However, it only explains concepts but not how to actually use any tool. I personally use DVC, but it's more focused on the model development/engineering phase. The different phases of ML are also done independently, which makes it even more difficult for an individual to have exposure to all the different areas. Moreover, the lack of standard tools and best practices makes it difficult, and the fact that every ML problem is different.
  • Oxen.ai: Fast Unstructured Data Version Control
    6 projects | news.ycombinator.com | 16 Feb 2023
    How does this compare with other systems, like DVC (https://dvc.org/) for example?
  • Career advice for getting into NLP from a Computer Science background?
    2 projects | /r/LanguageTechnology | 10 Feb 2023
    For the data cleaning and training parts, you might have projects where you've used kaggle datasets to train models and you've done appropriate feature engineering and data exploration to help you to understand whether data might need to be under or over sampled or cleaned in some other way. I'd give bonus points to someone who has thoughts about how training pipelines might be semi or fully automated in a production environment (e.g. use of scripts and tools like dvc to make things easy to reproduce. I'd want to see evidence of appropriate metrics (e.g. I know its 99% accurate and that might be great but if its a 10-way classification on a very unbalanced dataset, what can you tell me about performance on the smallest class?).
  • ML experiment tracking with DagsHub, MLFlow, and DVC
    4 projects | dev.to | 12 Jan 2023
    Here, we’ll implement the experimentation workflow using DagsHub, Google Colab, MLflow, and data version control (DVC). We’ll focus on how to do this without diving deep into the technicalities of building or designing a workbench from scratch. Going that route might increase the complexity involved, especially if you are in the early stages of understanding ML workflows, just working on a small project, or trying to implement a proof of concept.
  • Show HN: We scaled Git to support 1 TB repos
    9 projects | news.ycombinator.com | 13 Dec 2022
    There are a couple of other contenders in this space. DVC (https://dvc.org/) seems most similar.

    If you're interested in something you can self-host... I work on Pachyderm (https://github.com/pachyderm/pachyderm), which doesn't have a Git-like interface, but also implements data versioning. Our approach de-duplicates between files (even very small files), and our storage algorithm doesn't create objects proportional to O(n) directory nesting depth as Xet appears to. (Xet is very much like Git in that respect.)

    The data versioning system enables us to run pipelines based on changes to your data; the pipelines declare what files they read, and that allows us to schedule processing jobs that only reprocess new or changed data, while still giving you a full view of what "would" have happened if all the data had been reprocessed. This, to me, is the key advantage of data versioning; you can save hundreds of thousands of dollars on compute. Being able to undo an oopsie is just icing on the cake.

    Xet's system for mounting a remote repo as a filesystem is a good idea. We do that too :)

  • A note from our sponsor - Onboard AI
    getonboard.dev | 4 Oct 2023
    Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev. Learn more →


Basic dvc repo stats
8 days ago

iterative/dvc is an open source project licensed under Apache License 2.0 which is an OSI approved license.

The primary programming language of dvc is Python.

Free Global Payroll designed for tech teams
Building a great tech team takes more than a paycheck. Zero payroll costs, get AI-driven insights to retain best talent, and delight them with amazing local benefits. 100% free and compliant.