mandala vs dvc

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

mandala		dvc
	Project
8	Mentions	109
228	Stars	13,139
-	Growth	0.8%
6.3	Activity	9.6
about 2 months ago	Latest Commit	6 days ago
Python	Language	Python
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

mandala

Posts with mentions or reviews of mandala. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-07.

Mandala: A little plaground for testing pixel logic patterns
2 projects | news.ycombinator.com | 7 Mar 2024

I was so confused, expecting this to be some trickery related to the computational-graph-memoization-and-exploration tool "mandala" https://github.com/amakelov/mandala
Mandala: Notebook memoization on steroids, used by Anthropic
1 project | news.ycombinator.com | 21 Dec 2023
Improve Jupyter Notebook Reruns by Caching Cells
5 projects | news.ycombinator.com | 19 Dec 2023

This is neat and self-contained! But as someone running experiments with a high degree of interactivity, I often have an orthogonal requirement: add more computations to the same cell without recomputing previous computations done in the cell (or in other cells).
For a concrete example, often in an ML project you want to study how several quantities vary across several parameters. A straightforward workflow for this is: write some nested loops, collect results in python dictionaries, finally put everything together in a dataframe and compare (by plotting or otherwise).
However, after looking at the results, maybe you spot some trend and wonder if it will continue if you tweak one of the parameters by using a new value for it; of course, you also want to look at the previous values and bring everything together in the same plot(s). You now have a problem: either re-run the cell (thus losing previous work, which is annoying even if you have to wait 1 minute - you know it's a wasted minute!), or write the new computation in a new cell, possibly with a lot of redundancy (which over time makes the notebook hard to navigate and keep consistent).
So, this and other considerations eventually convinced me that the function is more natural than the cell as an interface/boundary at which caching should be implemented, at least for my use cases (coming from ML research). I wrote a framework based on this idea, with lots of other features (some quite experimental/unusual) to turn this into a feasible experiment management tool - check it out at https://github.com/amakelov/mandala
P.S.: I notice you use `pickle` for the hashing - `joblib.dump` is faster with objects containing numpy arrays, which covers a lot of useful ML things
ML Experiments Management with Git
4 projects | news.ycombinator.com | 2 Nov 2023

Another option, that manages versioning of your computational graph and its results and provides extremely elegant query-able memoization is Mandala https://github.com/amakelov/mandala
It is a much simpler and much more magical piece of software that truly expanded how I think about writing, exploring, and experimenting with code. Even if you never use it, you probably would really enjoy reading the blog posts the author wrote about the design of the tool https://amakelov.github.io/blog/pl/
Snakemake – A framework for reproducible data analysis
6 projects | news.ycombinator.com | 15 Jul 2023

You might like mandala (https://github.com/amakelov/mandala) - it is not a build recipe tool, rather it is a tool that tracks the history of how your builds / computational graph has changed, and ties it to how the data looked like at each such step.
Piper: A proposal for a graphy pipe-based build system
3 projects | /r/ProgrammingLanguages | 23 Apr 2023

u/rust4yy: I've been building mandala, a Python framework for (among other things) incremental computing. One way to think of it is "a build system for Python objects", except the units of computation are Python functions.

dvc

Posts with mentions or reviews of dvc. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-23.

My Favorite DevTools to Build AI/ML Applications!
9 projects | dev.to | 23 Apr 2024

Collaboration and version control are crucial in AI/ML development projects due to the iterative nature of model development and the need for reproducibility. GitHub is the leading platform for source code management, allowing teams to collaborate on code, track issues, and manage project milestones. DVC (Data Version Control) complements Git by handling large data files, data sets, and machine learning models that Git can't manage effectively, enabling version control for the data and model files used in AI projects.
Why bad scientific code beats code following "best practices"
3 projects | news.ycombinator.com | 6 Jan 2024

What you’re describing sounds like DVC (at a higher-ish—80%-solution level).
https://dvc.org/
See pachyderm too.
First 15 Open Source Advent projects
16 projects | dev.to | 15 Dec 2023

10. DVC by Iterative | Github | tutorial
Exploring Open-Source Alternatives to Landing AI for Robust MLOps
18 projects | dev.to | 13 Dec 2023

Platforms such as MLflow monitor the development stages of machine learning models. In parallel, Data Version Control (DVC) brings version control system-like functions to the realm of data sets and models.
ML Experiments Management with Git
4 projects | news.ycombinator.com | 2 Nov 2023
Git Version Controlled Datasets in S3
1 project | news.ycombinator.com | 25 Oct 2023

I was using DVC (https://dvc.org/) for some time to help solve this but it was getting hard to manage the storage connections and I would run into cache issues a lot, but this solves it using git-lfs itself.
Ask HN: How do your ML teams version datasets and models?
3 projects | news.ycombinator.com | 28 Sep 2023
Exploring MLOps Tools and Frameworks: Enhancing Machine Learning Operations
3 projects | dev.to | 6 Jun 2023

DVC (Data Version Control):
Evaluate and Track Your LLM Experiments: Introducing TruLens for LLMs
2 projects | news.ycombinator.com | 24 May 2023
[D] Is there a tool to keep track of my ML experiments?
2 projects | /r/MachineLearning | 13 May 2023

I have been using DVC and MLflow since then DVC had only data tracking and MLflow only model tracking. I can say both are awesome now and maybe the only factor I would like to mention is that IMO, MLflow is a bit harder to learn while DVC is just a git practically.

What are some alternatives?

When comparing mandala and dvc you can also consider the following projects:

oxen-release - Lightning fast data version control system for structured and unstructured machine learning datasets. We aim to make versioning datasets as easy as versioning code.

MLflow - Open source platform for the machine learning lifecycle

snakemake-wrappers - This is the development home of the Snakemake wrapper repository, see

lakeFS - lakeFS - Data version control for your data lake | Git for data

beaver - Simple, but capable build system and command runner for any project

Activeloop Hub - Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai [Moved to: https://github.com/activeloopai/deeplake]

aim - Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.

delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

sdk - Metadata store for Production ML

ploomber - The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️

make-booster - Utility routines to simplify using GNU make and Python

mandala vs oxen-release dvc vs MLflow mandala vs snakemake-wrappers dvc vs lakeFS mandala vs beaver dvc vs Activeloop Hub mandala vs aim dvc vs delta mandala vs sdk dvc vs ploomber mandala vs make-booster dvc vs aim

Compare mandala vs dvc and see what are their differences.

mandala

dvc

mandala

dvc

What are some alternatives?