make-booster vs oxen-release

make-booster

Utility routines to simplify using GNU make and Python (by david-a-wheeler)

oxen-release

Lightning fast data version control system for structured and unstructured machine learning datasets. We aim to make versioning datasets as easy as versioning code. (by Oxen-AI)

Artificial intelligence Data Science Machine Learning Python Rust Version control

Source Code

oxen.ai

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

make-booster		oxen-release
	Project
3	Mentions	22
8	Stars	837
-	Growth	2.3%
10.0	Activity	9.0
almost 2 years ago	Latest Commit	14 days ago
Makefile	Language	Python
MIT License	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

make-booster

Posts with mentions or reviews of make-booster. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-15.

Snakemake – A framework for reproducible data analysis
6 projects | news.ycombinator.com | 15 Jul 2023

For a very different approach, check out make-booster:
https://github.com/david-a-wheeler/make-booster
Make-booster provides utility routines intended to greatly simplify data processing (particularly a data pipeline) using GNU make. It includes some mechanisms specifically to help Python, as well as general-purpose mechanisms that can be useful in any system. In particular, it helps reliably reproduce results, and it automatically determines what needs to run and runs only that (producing a significant speedup in most cases). Released as open source software.
A Love Letter to Make
5 projects | news.ycombinator.com | 20 Apr 2023

https://github.com/david-a-wheeler/make-booster
I think a lot of hate on make is due to poor use. If your makefile is complex, refactor it. Auto-generate dependencies (it only takes a few lines in GNU make). And don't use recursive make, that way lies madness. I also think GNU make is the wiser tool; POSIX make lacks too much in many cases.
The Unreasonable Effectiveness of Makefiles
12 projects | news.ycombinator.com | 12 Aug 2022

https://github.com/david-a-wheeler/make-booster
From its readme:
"This project (contained in this directory and below) provides utility routines intended to greatly simplify data processing (particularly a data pipeline) using GNU make. It includes some mechanisms specifically to help Python, as well as general-purpose mechanisms that can be useful in any system. In particular, it helps reliably reproduce results, and it automatically determines what needs to run and runs only that (producing a significant speedup in most cases)."
"For example, imagine that Python file BBB.py says include CC, and file CC.py reads from file F.txt (and CC.py declares its INPUTS= as described below). Now if you modify file F.txt or CC.py, any rule that runs BBB.py will automatically be re-run in the correct order when you use make, even if you didn't directly edit BBB.py."
This is NOT functionality directly provided by Python, and the overhead with >1000 files was 0.07seconds which we could live with :-).

oxen-release

Posts with mentions or reviews of oxen-release. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-10.

Ask HN: Can we do better than Git for version control?
17 projects | news.ycombinator.com | 10 Dec 2023

We've been working on a data version control system called "oxen" optimized for large unstructured datasets that we are seeing more and more with the advent of many of the generative AI techniques.
Many of these datasets have many many images, videos, audio files, text as well as structured tabular datasets that git or git-lfs just falls flat on.
Would love anyone to kick the tires on it and let us know what you think:
https://github.com/Oxen-AI/oxen-release
The commands are mirrored after git so it is easy to learn, but optimized under the hood for larger datasets.
Snakemake – A framework for reproducible data analysis
6 projects | news.ycombinator.com | 15 Jul 2023

Super cool! Would love to see an integration with Oxen and their data version control https://github.com/Oxen-AI/oxen-release
Ask HN: Data Management for AI Training
3 projects | news.ycombinator.com | 30 Apr 2023

We have been working on a data version control tool called Oxen that is tackling many of your needs. Feel free to check it out here:
https://github.com/Oxen-AI/oxen-release#-oxen
Going down your list of requirements, Oxen has:
* Data versioning, similar paradigm to git, but built from the ground up for large ML datasets
A tale of Phobos – how we almost cracked a ransomware using CUDA
3 projects | news.ycombinator.com | 24 Feb 2023

We've been working on some open source tooling called "oxen" that was built for large datasets of images, video, audio, text etc. We wanted to solve the exact problem you're flagging here with git.
Feel free to check it out here https://github.com/Oxen-AI/oxen-release#-oxen would love any feedback!
Oxen.ai: Fast Unstructured Data Version Control
1 project | /r/patient_hackernews | 21 Feb 2023

1 project | /r/hackernews | 21 Feb 2023
A versioning system for ML data sets
1 project | /r/u_bear007 | 21 Feb 2023
Oxen - Version control for your machine learning datasets
1 project | /r/ArtificialInteligence | 21 Feb 2023

1 project | /r/AIandRobotics | 20 Feb 2023

1 project | /r/artificial | 20 Feb 2023

What are some alternatives?

When comparing make-booster and oxen-release you can also consider the following projects:

tclmake - Partial make clone in pure Tcl

VFSForGit - Virtual File System for Git: Enable Git at Enterprise Scale

checkexec - CLI tool to conditionally execute commands only when files in a dependency list have been updated. Like `make`, but standalone.

gpt-2-output-dataset - Dataset of GPT-2 outputs for research in detection, biases, and more

snakemake-wrappers - This is the development home of the Snakemake wrapper repository, see

dvc - 🦉 ML Experiments and Data Management with Git

mandala - A powerful and easy to use Python framework for experiment tracking and incremental computing

dud - A lightweight CLI tool for versioning data alongside source code and building data pipelines.

dagger - Application Delivery as Code that Runs Anywhere

just - 🤖 Just a command runner

dolt - Dolt – Git for Data

make-booster vs tclmake oxen-release vs VFSForGit make-booster vs checkexec oxen-release vs gpt-2-output-dataset make-booster vs snakemake-wrappers oxen-release vs dvc make-booster vs mandala oxen-release vs dud make-booster vs dagger oxen-release vs mandala make-booster vs just oxen-release vs dolt

Compare make-booster vs oxen-release and see what are their differences.

make-booster

oxen-release

make-booster

oxen-release

What are some alternatives?