dvc
pre-commit
dvc | pre-commit | |
---|---|---|
109 | 192 | |
13,139 | 12,087 | |
0.8% | 1.7% | |
9.6 | 8.0 | |
7 days ago | 7 days ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dvc
-
My Favorite DevTools to Build AI/ML Applications!
Collaboration and version control are crucial in AI/ML development projects due to the iterative nature of model development and the need for reproducibility. GitHub is the leading platform for source code management, allowing teams to collaborate on code, track issues, and manage project milestones. DVC (Data Version Control) complements Git by handling large data files, data sets, and machine learning models that Git can't manage effectively, enabling version control for the data and model files used in AI projects.
-
Why bad scientific code beats code following "best practices"
What you’re describing sounds like DVC (at a higher-ish—80%-solution level).
https://dvc.org/
See pachyderm too.
-
First 15 Open Source Advent projects
10. DVC by Iterative | Github | tutorial
-
Exploring Open-Source Alternatives to Landing AI for Robust MLOps
Platforms such as MLflow monitor the development stages of machine learning models. In parallel, Data Version Control (DVC) brings version control system-like functions to the realm of data sets and models.
- ML Experiments Management with Git
-
Git Version Controlled Datasets in S3
I was using DVC (https://dvc.org/) for some time to help solve this but it was getting hard to manage the storage connections and I would run into cache issues a lot, but this solves it using git-lfs itself.
- Ask HN: How do your ML teams version datasets and models?
-
Exploring MLOps Tools and Frameworks: Enhancing Machine Learning Operations
DVC (Data Version Control):
- Evaluate and Track Your LLM Experiments: Introducing TruLens for LLMs
-
[D] Is there a tool to keep track of my ML experiments?
I have been using DVC and MLflow since then DVC had only data tracking and MLflow only model tracking. I can say both are awesome now and maybe the only factor I would like to mention is that IMO, MLflow is a bit harder to learn while DVC is just a git practically.
pre-commit
-
How to setup Black and pre-commit in python for auto text-formatting on commit
Today we are going to look at how to setup Black (a python code formatter) and pre-commit (a package for handling git hooks in python) to automatically format you code on commit.
-
Implementing Quality Checks In Your Git Workflow With Hooks and pre-commit
# See https://pre-commit.com for more information # See https://pre-commit.com/hooks.html for more hooks repos: - repo: https://github.com/pre-commit/pre-commit-hooks rev: v3.2.0 hooks: - id: trailing-whitespace - id: end-of-file-fixer - id: check-yaml - id: check-toml - id: check-added-large-files - repo: local hooks: - id: tox lint name: tox-validation entry: pdm run tox -e test,lint language: system files: ^src\/.+py$|pyproject.toml|^tests\/.+py$ types_or: [python, toml] pass_filenames: false - id: tox docs name: tox-docs language: system entry: pdm run tox -e docs types_or: [python, rst, toml] files: ^src\/.+py$|pyproject.toml|^docs\/ pass_filenames: false - repo: https://github.com/pdm-project/pdm rev: 2.10.4 # a PDM release exposing the hook hooks: - id: pdm-lock-check - repo: https://github.com/jumanjihouse/pre-commit-hooks rev: 3.0.0 hooks: - id: markdownlint
-
Embracing Modern Python for Web Development
Pre-commit hooks act as the first line of defense in maintaining code quality, seamlessly integrating with linters and code formatters. They automatically execute these tools each time a developer tries to commit code to the repository, ensuring the code adheres to the project's standards. If the hooks detect issues, the commit is paused until the issues are resolved, guaranteeing that only code meeting quality standards makes it into the repository.
- EmacsConf Live Now
-
A Tale of Two Kitchens - Hypermodernizing Your Python Code Base
Pre-commit Hooks: Pre-commit is a tool that can be set up to enforce coding rules and standards before you commit your changes to your code repository. This ensures that you can't even check in (commit) code that doesn't meet your standards. This allows a code reviewer to focus on the architecture of a change while not wasting time with trivial style nitpicks.
-
Things I just don't like about Git
Ah, fair enough!
On my team we use pre-commit[0] a lot. I guess I would define the history to be something like "has this commit ever been run through our pre-commit hooks?". If you rewrite history, you'll (usually) produce commits that have not been through pre-commit (and they've therefore dodged a lot of static checks that might catch code that wasn't working, at that point in time). That gives some manner of objectivity to the "history", although it does depend on each user having their pre-commit hooks activated in their local workspace.
[0]: https://pre-commit.com/
-
Django Code Formatting and Linting Made Easy: A Step-by-Step Pre-commit Hook Tutorial
Pre-commit is a framework for managing and maintaining multi-language pre-commit hooks. It supports hooks for various programming languages. Using this framework, you only have to specify a list of hooks you want to run before every commit, and pre-commit handles the installation and execution of those hooks despite your project’s primary language.
-
Git: fu** the history!
You can learn more here: pre-commit.com
-
[Tool Anouncement] github-distributed-owners - A tool for managing GitHub CODEOWNERS using OWNERS files distributed throughout your code base. Especially helpful for monorepos / multi-team repos
Note this includes support for pre-commit.
-
Packaging Python projects in 2023 from scratch
As a nice next step, you could also add mypy to check your type hints are consistent, and automate running all this via pre-commit hooks set up with… pre-commit.
What are some alternatives?
MLflow - Open source platform for the machine learning lifecycle
husky - Git hooks made easy 🐶 woof!
lakeFS - lakeFS - Data version control for your data lake | Git for data
gitleaks - Protect and discover secrets using Gitleaks 🔑
Activeloop Hub - Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai [Moved to: https://github.com/activeloopai/deeplake]
ruff - An extremely fast Python linter and code formatter, written in Rust.
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
semgrep - Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.
ploomber - The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Poetry - Python packaging and dependency management made easy
aim - Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.
pre-commit-golang - Pre-commit hooks for Golang with support for monorepos, the ability to pass arguments and environment variables to all hooks, and the ability to invoke custom go tools.