Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Reproducibility Open-Source Projects
-
wandb
🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Sacred
Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
-
lightning-hydra-template
PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡
-
snakemake
This is the development home of the workflow management system Snakemake. For general information, see
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
drake
An R-focused pipeline toolkit for reproducibility and high-performance computing (by ropensci)
-
bingo
Like `go get` but for Go tools! CI Automating versioning of Go binaries in a nested, isolated Go modules.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Collaboration and version control are crucial in AI/ML development projects due to the iterative nature of model development and the need for reproducibility. GitHub is the leading platform for source code management, allowing teams to collaborate on code, track issues, and manage project milestones. DVC (Data Version Control) complements Git by handling large data files, data sets, and machine learning models that Git can't manage effectively, enabling version control for the data and model files used in AI projects.
Project mention: A list of SaaS, PaaS and IaaS offerings that have free tiers of interest to devops and infradev | dev.to | 2024-02-05Weights & Biases — The developer-first MLOps platform. Build better models faster with experiment tracking, dataset versioning, and model management. Free tier for personal projects only, with 100 GB of storage included.
Project mention: Sacred VS cascade - a user suggested alternative | libhunt.com/r/sacred | 2023-12-05
Project mention: User-friendly PyTorch Lightning and Hydra template for ML experimentation | news.ycombinator.com | 2024-02-05
Project mention: Instance segmentation of small objects in grainy drone imagery | /r/computervision | 2023-12-09
Project mention: Dream2nix – Automate reproducible packaging for various language ecosystems | news.ycombinator.com | 2023-10-13
Provide us with the code in a minimum working example (MWE) form (typically called reprex in the R community). If the process of producing this doesn't help you solve the issue, it at least allows us to (a) copy and paste to run the code, and (b) far more likely to be able to spot the error(s) than when they’re embedded in a load of code that isn’t necessary to highlight the problem. These should be placed in a code block, either every line indented 4 spaces or within backticks. (Credit to Mooks79 for this.) More info here: https://www.reddit.com/r/learnpython/wiki/faq#wiki_how_do_i_format_code.3F
The OpenWDL community is pleased to announce the release of Workflow Description Language (WDL) 1.1.1! This post highlights the most important changes in this release.
Check these out https://github.com/the-nix-way/dev-templates
> There aren't good boundaries between Jupyter's own Python environment, and that of your notebooks— if you have a dependency which conflicts with one of Jupyter's dependencies, then good luck.
I believe that you can use https://github.com/tweag/jupyenv for this.
Project mention: Is there a standard file in Golang from which packages could be installed? Yes, I am aware about go.mod, but hear me out. | /r/golang | 2023-05-02I'm using https://github.com/bwplotka/bingo for build dependencies, it's generating env and makefile includes to install and use the tools on the fly.
Reproducibility related posts
- Git Version Controlled Datasets in S3
- Do the work
- Ask HN: How do your ML teams version datasets and models?
- What is the best way to simplify a ggplot with 10 factors?
- Mega fast startup times
- Evaluate and Track Your LLM Experiments: Introducing TruLens for LLMs
- [D] Is there a tool to keep track of my ML experiments?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Apr 2024
Index
What are some of the best open-source Reproducibility projects? This list will help you:
Project | Stars | |
---|---|---|
1 | dvc | 13,116 |
2 | wandb | 8,190 |
3 | Sacred | 4,157 |
4 | lightning-hydra-template | 3,658 |
5 | catalyst | 3,223 |
6 | snakemake | 2,109 |
7 | garage | 1,812 |
8 | benchmark_VAE | 1,680 |
9 | EvalAI | 1,677 |
10 | mlreef | 1,442 |
11 | drake | 1,330 |
12 | torch-fidelity | 870 |
13 | targets | 866 |
14 | dream2nix | 856 |
15 | mach-nix | 829 |
16 | reprex | 727 |
17 | wdl | 725 |
18 | dev-templates | 685 |
19 | nn-template | 613 |
20 | jupyenv | 598 |
21 | framework-reproducibility | 417 |
22 | bingo | 323 |
23 | hydra-zen | 277 |
Sponsored