Dvc Alternatives
Similar projects and alternatives to dvc
-
-
-
SonarLint
Deliver Cleaner and Safer Code - Right in Your IDE of Choice!. SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.
-
ploomber
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
-
Activeloop Hub
Dataset format for AI. Build, manage, query & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai (by activeloopai)
-
-
git-submodules
Git Submodule alternative with equivalent features, but easier to use and maintain.
-
-
Scout APM
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
-
-
EdenSCM
EdenSCM is a cross-platform, highly scalable source control management system.
-
spock
spock is a framework that helps manage complex parameter configurations during research and development of Python applications (by fidelity)
-
commitizen
Create committing rules for projects :rocket: auto bump versions :arrow_up: and auto changelog generation :open_file_folder:
-
-
pre-commit
A framework for managing and maintaining multi-language pre-commit hooks.
-
-
-
-
-
lowdefy
An open-source, self-hosted, low-code framework to build internal tools, web apps, admin panels, BI dashboards, workflows, and CRUD apps with YAML or JSON.
-
labml
🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱
-
dvc reviews and mentions
- How do you track your experiments?
-
Eden
Data can and should be versioned, but not by just `git add BLOAT`. Take a look at https://dvc.org/: blobs are uploaded to a S3 compatible blob storage, metadata is versioned in a config file and this one gets versioned in git
-
GitHub for code but where/how do you organize your datafiles?
In the field of machine learning more and more people are using dvc: think of it as a "git" for data with native integration to git itself.
-
How can I have an organized workflow in R?
If you have datasets that are constantly updated, you might find DVC (https://github.com/iterative/dvc) helpful.
-
DevOps Fundamentals for Deep Learning Engineers
MLOps is a HUGE area to explore, and not surprisingly, there are many startups showing up in this space. If you want to get it on the latest trends, then I would look at workflow orchestration frameworks such as Metaflow (started off at Netflix, is now spinning off into its own enterprise business, https://metaflow.org/), Kubeflow (used at Google, https://www.kubeflow.org/), Airflow (used at Airbnb, https://airflow.apache.org/), and Luigi (used at Spotify, https://github.com/spotify/luigi). Then you have the model serving itself, so there is Seldon (https://www.seldon.io/), Torchserve (https://pytorch.org/serve/), and TensorFlow Serving (https://www.tensorflow.org/tfx/guide/serving). You also have the actual export and transfer of DL models, and ONNX is the most popular here (https://onnx.ai/). Spark (https://spark.apache.org/) still holds up nicely after all these years, especially if you are doing batch predictions on massive amount of data. There is also the GitFlow way of doing things and Data Version Control (DVC, https://dvc.org/) is taken a pole position there.
-
Do you guys actually know how to use git?
ML teams should also review DVC (refer https://dvc.org/) . Would be useful for code, datasets, and ML models. Becomes a useful tool for ML experiment tracking too.
- DoltLab v0.2.0
-
[N] Experiment tracking with DvC and Guild AI
I'm the author of Guild AI (open source experiment tracking). For some time now Guild users have asked for DvC support. This is now available as a pre-release.
-
Data Science Workflows — Notebook to Production
At DagsHub, we’re integrated with DVC, which I love using. First and foremost, it’s open-source. It provides pipeline capabilities and supports many cloud providers for remote storage. Also, DVC acts as an extension to Git, which allows you to keep using the standard Git flow in your work. If you don’t want to use both tools, I recommend using FDS, an open-source tool that makes version control for machine learning fast & easy. It combines Git and DVC under one roof and takes care of code, data, and model versioning. (Bias alert: DagsHub developed FDS)
Git was designed for managing software development projects and for versioning text/code files. Therefore, Git doesn’t handle large files. Git released Git LFS (Large File System) to overcome large file versioning, which is better than Git, but fails when scaling. Also, both Git and Git LFS are not optimized for data science workflow. To overcome this challenge, many powerful tools emerged in recent years, such as DVC, Delta Lake, LakeFS, and more.
-
How do i create a single workspace when i work on multiple devices
In addition to what others have said about using github, you should consider using a data versioning tool like DVC. It's very flexible and allows for syncing with all the main cloud storage servers, as well as direct syncing between devices using ssh. Plus you get all the benefits of having your data versions linked with your code versions in git.
-
[D] Why doesn’t your team use an experiment tracking tool?
I've been integrating DVC into our pipeline, from data processing to tracking experiment metrics and hyperparameters. The data versioning works quite well for us. We use a high performance networked volume for storing the cache (AWS FSx for Lustre) and use AWS S3 for perpetual storage of data, models, and other dependencies & outputs. The workspace is hard-linked to the cache.
Unfortunately, there are some issues with `dvc exp` --- the set of experiment tracking subcommands. In particular, I rely heavily on git submodules to partition the code that instantiates a model from the code that runs an experiment. But `dvc exp` doesn't work with submodules ATM. (Bug filed here.) This is unfortunate because, if `dvc exp` worked, it would make experiment tracking a little more convenient for us. It's not a deal breaker though. I use git branches to organize individual experiments and tags to organize stages of the same experiment. I use a shared dvc cache so that I can run multiple experiments at a time without using up too much workspace storage.
-
Autodocumenting Makefiles
For data science specifically, I would strongly suggest looking into DVC: https://dvc.org/.
You can easily write DVC stage files by hand as a straightforward Makefile replacement, and integrate other features into your workflow as needed/desired.
Stats
iterative/dvc is an open source project licensed under Apache License 2.0 which is an OSI approved license.
Popular Comparisons
Are you hiring? Post a new remote job listing for free.