dvc
guildai
Our great sponsors
dvc | guildai | |
---|---|---|
108 | 16 | |
13,032 | 855 | |
1.6% | 1.1% | |
9.7 | 8.8 | |
about 13 hours ago | 8 months ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dvc
-
Why bad scientific code beats code following "best practices"
What you’re describing sounds like DVC (at a higher-ish—80%-solution level).
See pachyderm too.
-
First 15 Open Source Advent projects
10. DVC by Iterative | Github | tutorial
-
Exploring Open-Source Alternatives to Landing AI for Robust MLOps
Platforms such as MLflow monitor the development stages of machine learning models. In parallel, Data Version Control (DVC) brings version control system-like functions to the realm of data sets and models.
- ML Experiments Management with Git
- Ask HN: How do your ML teams version datasets and models?
-
Exploring MLOps Tools and Frameworks: Enhancing Machine Learning Operations
DVC (Data Version Control):
- Evaluate and Track Your LLM Experiments: Introducing TruLens for LLMs
-
[D] Is there a tool to keep track of my ML experiments?
I have been using DVC and MLflow since then DVC had only data tracking and MLflow only model tracking. I can say both are awesome now and maybe the only factor I would like to mention is that IMO, MLflow is a bit harder to learn while DVC is just a git practically.
-
Ask HN: Data Management for AI Training
* User interface for less tech savy people ( e.g just a git like command line is fine for engineers but not for field personell who are not in IT )
I know of tools like https://dvc.org/ but a) they are just layers on top of git b) break appart on huge datasets without a folder hierarchy ( git tree objects just don't work for linear lists of items ) are only useable by IT personell, and require checking out at least a part of the dataset.
Our datasets would be 100.000.000 x 100 MB = 10 PB of raw data. Training data should be delivered to training nodes via network etc.. we just can't have a full checkout of that data...
-
Do you wonder why MLOps is not at the same level as DevOps?
Hey, great find! However, it only explains concepts but not how to actually use any tool. I personally use DVC, but it's more focused on the model development/engineering phase. The different phases of ML are also done independently, which makes it even more difficult for an individual to have exposure to all the different areas. Moreover, the lack of standard tools and best practices makes it difficult, and the fact that every ML problem is different.
guildai
-
guildai VS cascade - a user suggested alternative
2 projects | 5 Dec 2023
-
[D] Who here are convinced that they have a really good setup that keeps track of their ML experiments?
Experiment tracking in DvC is implemented using git to store snapshots of a project and related artifacts. You might take a look at Guild AI's support for DvC, which is tightly integrated with DvC stages. You can run any of the stages defined for a project and you get a properly isolated run (each run is a project copy to ensure that you're not corrupting the run if you modify files while it's running - as well as properly supporting concurrent runs). Once you have runs in Guild, you can use any number of tools to study, compare, export, etc.
-
[D] Deploying SOTA models into my own projects
I built an experiment tracking tool (Guild AI) that focuses on code/model reuse and so this question is dear to my heart :) Best of luck!
-
[P] I reviewed 50+ open-source MLOps tools. Here’s the result
I'm not aware of experiment tracking in Jupyter notebooks themselves. Guild AI is able to run notebooks as experiments however.
-
[D] What MLOps platform do you use, and how helpful are they?
Disclosure - I'm the author of Guild AI so take this for the biased opinion that it is.
-
[N] Experiment tracking with DvC and Guild AI
I'm the author of Guild AI (open source experiment tracking). For some time now Guild users have asked for DvC support. This is now available as a pre-release.
-
[D] Why doesn’t your team use an experiment tracking tool?
Guild AI now has support for running DvC stages as experiments. DvC uses git under the covers to manage project state for each experiment, along with the experiment results. Guild doesn't touch your git repo and instead copies your project source to a new run directory. This ensures that you have a correct record of your experiment without churning your project state.
-
Data Science toolset summary from 2021
Guild.ai - https://guild.ai/
- [D] How do you ensure reproducibility?
-
[D] I'm new and scrappy. What tips do you have for better logging and documentation when training or hyperparameter training?
Use guild and pytorch-lightning. Make it easy for new contributors to get your data by using dvc as a data access tool.
What are some alternatives?
MLflow - Open source platform for the machine learning lifecycle
lakeFS - lakeFS - Data version control for your data lake | Git for data
Activeloop Hub - Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai [Moved to: https://github.com/activeloopai/deeplake]
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
ploomber - The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
aim - Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.
git-submodules - Git Submodule alternative with equivalent features, but easier to use and maintain.
palm-dbt - dbt plugin for Palm CLI
git-lfs - Git extension for versioning large files
pytorch-lightning - Build high-performance AI models with PyTorch Lightning (organized PyTorch). Deploy models with Lightning Apps (organized Python to build end-to-end ML systems). [Moved to: https://github.com/Lightning-AI/lightning]