Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 21 Python Reproducibility Projects
-
Project mention: Why bad scientific code beats code following "best practices" | news.ycombinator.com | 2024-01-06
What you’re describing sounds like DVC (at a higher-ish—80%-solution level).
See pachyderm too.
-
wandb
🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.
Project mention: A list of SaaS, PaaS and IaaS offerings that have free tiers of interest to devops and infradev | dev.to | 2024-02-05Weights & Biases — The developer-first MLOps platform. Build better models faster with experiment tracking, dataset versioning, and model management. Free tier for personal projects only, with 100 GB of storage included.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Sacred
Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
Project mention: Sacred VS cascade - a user suggested alternative | libhunt.com/r/sacred | 2023-12-05 -
lightning-hydra-template
PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡
Project mention: User-friendly PyTorch Lightning and Hydra template for ML experimentation | news.ycombinator.com | 2024-02-05 -
Project mention: Instance segmentation of small objects in grainy drone imagery | /r/computervision | 2023-12-09
-
-
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
-
Project mention: [D] A better way to compute the Fréchet Inception Distance (FID) | /r/MachineLearning | 2023-04-10
The Fréchet Inception Distance (FID) is a widespread metric to assess the quality of the distribution of a image generative model (GAN, Stable Diffusion, etc.). The metric is not trivial to implement as one needs to compute the trace of the square root of a matrix. In all PyTorch repositories I have seen that implement the FID (https://github.com/mseitzer/pytorch-fid, https://github.com/GaParmar/clean-fid, https://github.com/toshas/torch-fidelity, ...), the authors rely on SciPy's sqrtm to compute the square root of the matrix, which is unstable and slow.
-
-
-
Project mention: Tensorflow: I'm getting different results from the same code depending on where I run it. [D] | /r/MachineLearning | 2023-04-05
Even with a fixed seed there's no guarantee that you'll get the exact same results due to the fact that most floating operations are not deterministic when parallelized. You can enable determinism flags in your framework to try and mitigate that, but results may still vary depending on your model and how you're running it.
-
-
-
-
singularity-hpc
Local filesystem registry for containers (intended for HPC) using Lmod or Environment Modules. Works for users and admins.
-
rna-seq-kallisto-sleuth
A Snakemake workflow for differential expression analysis of RNA-seq data with Kallisto and Sleuth.
-
Project mention: What are some good examples of well-engineered pipelines | /r/ScientificComputing | 2023-04-05
I expaned a bit on them with my own package https://zntrack.readthedocs.io/ - a general framework for building DVC pipelines through python scripts (and more). This finally brings me to the project I'm actually working on https://github.com/zincware/IPSuite which brings all of this together for the specific use case of machine learned interatomic potentials.
-
-
SmartPipeline
A framework for rapid development of robust data pipelines following a simple design pattern
Project mention: Show HN: SmartPipeline, robust and light data pipelines in Python | news.ycombinator.com | 2023-05-03 -
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Reproducibility related posts
- Git Version Controlled Datasets in S3
- Ask HN: How do your ML teams version datasets and models?
- Evaluate and Track Your LLM Experiments: Introducing TruLens for LLMs
- [D] Is there a tool to keep track of my ML experiments?
- Where do I best store my test data when using github for code?
- Using git to version control experimental data (not code)?
- Introduction to Data Version Control
-
A note from our sponsor - InfluxDB
www.influxdata.com | 29 Mar 2024
Index
What are some of the best open-source Reproducibility projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | dvc | 13,032 |
2 | wandb | 8,036 |
3 | Sacred | 4,151 |
4 | lightning-hydra-template | 3,611 |
5 | catalyst | 3,216 |
6 | garage | 1,806 |
7 | EvalAI | 1,656 |
8 | benchmark_VAE | 1,655 |
9 | torch-fidelity | 857 |
10 | mach-nix | 820 |
11 | nn-template | 604 |
12 | framework-reproducibility | 413 |
13 | hydra-zen | 269 |
14 | example-get-started | 167 |
15 | signac | 129 |
16 | singularity-hpc | 101 |
17 | rna-seq-kallisto-sleuth | 62 |
18 | ZnTrack | 41 |
19 | pi-ci | 37 |
20 | SmartPipeline | 21 |
21 | memorization | 5 |