common
mandala
Our great sponsors
common | mandala | |
---|---|---|
4 | 8 | |
163 | 228 | |
2.5% | - | |
9.7 | 6.3 | |
5 days ago | about 2 months ago | |
Go | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
common
-
Improve Jupyter Notebook Reruns by Caching Cells
Dockerfile and Containerfile also cache outputs as layers.
`docker build --layers` is the default: https://docs.podman.io/en/latest/markdown/podman-build.1.htm...
container/common//docs/Containerfile.5.md: https://github.com/containers/common/blob/main/docs/Containe...
-
Is it possible to generate the default `containers.conf`?
https://github.com/containers/common/blob/main/pkg/config/containers.conf ? But it's empty save for the comments, since it's supposed to be customized by your distribution. Still, starting point.
-
buildah/podman overlay on build context
Argh, the fix is rw on the mount line, not ro=false. This is missing from the documentation (https://github.com/containers/common/blob/main/docs/Containerfile.5.md).
-
File contents intermediate layers from building
A good resource to start is the Containerfile man page.
mandala
-
Mandala: A little plaground for testing pixel logic patterns
I was so confused, expecting this to be some trickery related to the computational-graph-memoization-and-exploration tool "mandala" https://github.com/amakelov/mandala
- Mandala: Notebook memoization on steroids, used by Anthropic
-
Improve Jupyter Notebook Reruns by Caching Cells
This is neat and self-contained! But as someone running experiments with a high degree of interactivity, I often have an orthogonal requirement: add more computations to the same cell without recomputing previous computations done in the cell (or in other cells).
For a concrete example, often in an ML project you want to study how several quantities vary across several parameters. A straightforward workflow for this is: write some nested loops, collect results in python dictionaries, finally put everything together in a dataframe and compare (by plotting or otherwise).
However, after looking at the results, maybe you spot some trend and wonder if it will continue if you tweak one of the parameters by using a new value for it; of course, you also want to look at the previous values and bring everything together in the same plot(s). You now have a problem: either re-run the cell (thus losing previous work, which is annoying even if you have to wait 1 minute - you know it's a wasted minute!), or write the new computation in a new cell, possibly with a lot of redundancy (which over time makes the notebook hard to navigate and keep consistent).
So, this and other considerations eventually convinced me that the function is more natural than the cell as an interface/boundary at which caching should be implemented, at least for my use cases (coming from ML research). I wrote a framework based on this idea, with lots of other features (some quite experimental/unusual) to turn this into a feasible experiment management tool - check it out at https://github.com/amakelov/mandala
P.S.: I notice you use `pickle` for the hashing - `joblib.dump` is faster with objects containing numpy arrays, which covers a lot of useful ML things
-
ML Experiments Management with Git
Another option, that manages versioning of your computational graph and its results and provides extremely elegant query-able memoization is Mandala https://github.com/amakelov/mandala
It is a much simpler and much more magical piece of software that truly expanded how I think about writing, exploring, and experimenting with code. Even if you never use it, you probably would really enjoy reading the blog posts the author wrote about the design of the tool https://amakelov.github.io/blog/pl/
-
Snakemake – A framework for reproducible data analysis
You might like mandala (https://github.com/amakelov/mandala) - it is not a build recipe tool, rather it is a tool that tracks the history of how your builds / computational graph has changed, and ties it to how the data looked like at each such step.
-
Piper: A proposal for a graphy pipe-based build system
u/rust4yy: I've been building mandala, a Python framework for (among other things) incremental computing. One way to think of it is "a build system for Python objects", except the units of computation are Python functions.
What are some alternatives?
Psake - A build automation tool written in PowerShell
oxen-release - Lightning fast data version control system for structured and unstructured machine learning datasets. We aim to make versioning datasets as easy as versioning code.
metatar - Manipulate tar file metadata, list tar files or convert tar to cpio. For some projects, this can replace fakeroot and cpio, when creating an initrd image that is compatible with the Linux kernel. Used in production in at least one company.
snakemake-wrappers - This is the development home of the Snakemake wrapper repository, see
FlubuCore - A cross platform build and deployment automation system for building projects and executing deployment scripts using C# code.
beaver - Simple, but capable build system and command runner for any project
NUKE - 🏗 The AKEless Build System for C#/.NET
aim - Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.
FAKE - FAKE - F# Make
sdk - Metadata store for Production ML
Cake - :cake: Cake (C# Make) is a cross platform build automation system.
make-booster - Utility routines to simplify using GNU make and Python