dvc
Flyway
Our great sponsors
dvc | Flyway | |
---|---|---|
108 | 79 | |
13,032 | 7,728 | |
1.6% | 1.1% | |
9.7 | 7.2 | |
1 day ago | 15 days ago | |
Python | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dvc
-
Why bad scientific code beats code following "best practices"
What you’re describing sounds like DVC (at a higher-ish—80%-solution level).
See pachyderm too.
-
First 15 Open Source Advent projects
10. DVC by Iterative | Github | tutorial
-
Exploring Open-Source Alternatives to Landing AI for Robust MLOps
Platforms such as MLflow monitor the development stages of machine learning models. In parallel, Data Version Control (DVC) brings version control system-like functions to the realm of data sets and models.
- ML Experiments Management with Git
- Ask HN: How do your ML teams version datasets and models?
-
Exploring MLOps Tools and Frameworks: Enhancing Machine Learning Operations
DVC (Data Version Control):
- Evaluate and Track Your LLM Experiments: Introducing TruLens for LLMs
-
[D] Is there a tool to keep track of my ML experiments?
I have been using DVC and MLflow since then DVC had only data tracking and MLflow only model tracking. I can say both are awesome now and maybe the only factor I would like to mention is that IMO, MLflow is a bit harder to learn while DVC is just a git practically.
-
Ask HN: Data Management for AI Training
* User interface for less tech savy people ( e.g just a git like command line is fine for engineers but not for field personell who are not in IT )
I know of tools like https://dvc.org/ but a) they are just layers on top of git b) break appart on huge datasets without a folder hierarchy ( git tree objects just don't work for linear lists of items ) are only useable by IT personell, and require checking out at least a part of the dataset.
Our datasets would be 100.000.000 x 100 MB = 10 PB of raw data. Training data should be delivered to training nodes via network etc.. we just can't have a full checkout of that data...
-
Do you wonder why MLOps is not at the same level as DevOps?
Hey, great find! However, it only explains concepts but not how to actually use any tool. I personally use DVC, but it's more focused on the model development/engineering phase. The different phases of ML are also done independently, which makes it even more difficult for an individual to have exposure to all the different areas. Moreover, the lack of standard tools and best practices makes it difficult, and the fact that every ML problem is different.
Flyway
-
PostgreSQL Is Enough
There is a bit of tooling needed but is already around. For Java for example I had very good experience with a combination of flyway [1] for migrations, testcontainers [2] for making integration tests as easy as unit tests and querydsl [3] for a query and mapping layer.
[1] https://github.com/flyway/flyway
[2] https://java.testcontainers.org/modules/databases/postgres/
-
CI/CD for Databricks
If you're looking for tools, like https://www.liquibase.com/ or https://flywaydb.org/, which are database-state-based schema migration toolkits - it might be relatively straightforward to build similar ones using Databricks SQL drivers.
-
Working with jOOQ and Flyway using Testcontainers
Honestly I kind of wish there was a Lukas Eder database migration library. Call it whatever jooq-migration. At least I would have more insight of what is going on (<-- seriously look at the commit history).
-
Strategy to run database scripts on Kubernetes
This is a 4th option, which should play nice with ArgoCD. The following example runs flyway as a k8s job. The desired migration changes are recorded as files within the chart. This helm chart can be integrated with your application (Using hooks to determine when the migration job is run) or run manually.
-
How do your teams run DB migrations?
By using an opinionated framework within the app/service (like Flyway, Migrate, Diesel, etc). Schema migrations happen on app/service start-up.
-
Version control for database used by C# app
Flyway
-
Using Flyway for Database Setup
The grown-up way of creating a database schema is migrations, and no-one ever got fired for choosing Flyway (https://flywaydb.org/), so that's what we'll investigate today. By the end we are able to create the same schema as Exposed was creating, and then, as a second migration, add some constraints to the items table to reflect the reality of our data. And the transition from Exposed to jOOQ is complete!
-
How to run DB migrations in CICD Pipeline
We use https://flywaydb.org/. You can do the migration before or during service start-up. We do it during.
-
🏅 Http4k: Top 5 Server-Side Frameworks for Kotlin in 2022
We just create the greetings table if it does not exist (instead of any database migration library like flyway)
-
How to people organize their Repos?
Also, from the "DevOps" point of view, this totally depends on what you want to achieve. If is a project that has changes on the database (new views, new tables...) on a regular basis I would consider using https://flywaydb.org/ in the pipeline.
What are some alternatives?
alembic - A database migrations tool for SQLAlchemy.
MLflow - Open source platform for the machine learning lifecycle
HikariCP - 光 HikariCP・A solid, high-performance, JDBC connection pool at last.
lakeFS - lakeFS - Data version control for your data lake | Git for data
roundhouse - RoundhousE is a Database Migration Utility for .NET using sql files and versioning based on source control
Activeloop Hub - Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai [Moved to: https://github.com/activeloopai/deeplake]
H2 - H2 is an embeddable RDBMS written in Java.
dbmate - :rocket: A lightweight, framework-agnostic database migration tool.
Hibernate - Hibernate's core Object/Relational Mapping functionality
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Apache Hive - Apache Hive
JDBI - The Jdbi library provides convenient, idiomatic access to relational databases in Java and other JVM technologies such as Kotlin, Clojure or Scala.