Python Reproducibility

Open-source Python projects categorized as Reproducibility

Top 21 Python Reproducibility Projects

  • dvc

    🦉 ML Experiments and Data Management with Git

    Project mention: Why bad scientific code beats code following "best practices" | news.ycombinator.com | 2024-01-06

    What you’re describing sounds like DVC (at a higher-ish—80%-solution level).

    https://dvc.org/

    See pachyderm too.

  • wandb

    🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.

    Project mention: A list of SaaS, PaaS and IaaS offerings that have free tiers of interest to devops and infradev | dev.to | 2024-02-05

    Weights & Biases — The developer-first MLOps platform. Build better models faster with experiment tracking, dataset versioning, and model management. Free tier for personal projects only, with 100 GB of storage included.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • Sacred

    Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.

    Project mention: Sacred VS cascade - a user suggested alternative | libhunt.com/r/sacred | 2023-12-05
  • lightning-hydra-template

    PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡

    Project mention: User-friendly PyTorch Lightning and Hydra template for ML experimentation | news.ycombinator.com | 2024-02-05
  • catalyst

    Accelerated deep learning R&D (by catalyst-team)

    Project mention: Instance segmentation of small objects in grainy drone imagery | /r/computervision | 2023-12-09
  • garage

    A toolkit for reproducible reinforcement learning research.

  • EvalAI

    :cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • benchmark_VAE

    Unifying Variational Autoencoder (VAE) implementations in Pytorch (NeurIPS 2022)

  • torch-fidelity

    High-fidelity performance metrics for generative models in PyTorch

    Project mention: [D] A better way to compute the Fréchet Inception Distance (FID) | /r/MachineLearning | 2023-04-10

    The Fréchet Inception Distance (FID) is a widespread metric to assess the quality of the distribution of a image generative model (GAN, Stable Diffusion, etc.). The metric is not trivial to implement as one needs to compute the trace of the square root of a matrix. In all PyTorch repositories I have seen that implement the FID (https://github.com/mseitzer/pytorch-fid, https://github.com/GaParmar/clean-fid, https://github.com/toshas/torch-fidelity, ...), the authors rely on SciPy's sqrtm to compute the square root of the matrix, which is unstable and slow.

  • mach-nix

    Create highly reproducible python environments

  • nn-template

    Generic template to bootstrap your PyTorch project.

  • framework-reproducibility

    Providing reproducibility in deep learning frameworks

    Project mention: Tensorflow: I'm getting different results from the same code depending on where I run it. [D] | /r/MachineLearning | 2023-04-05

    Even with a fixed seed there's no guarantee that you'll get the exact same results due to the fact that most floating operations are not deterministic when parallelized. You can enable determinism flags in your framework to try and mitigate that, but results may still vary depending on your model and how you're running it.

  • hydra-zen

    Create powerful Hydra applications without the yaml files and boilerplate code.

  • example-get-started

    Get started DVC project

  • signac

    Manage large and heterogeneous data spaces on the file system.

  • singularity-hpc

    Local filesystem registry for containers (intended for HPC) using Lmod or Environment Modules. Works for users and admins.

  • rna-seq-kallisto-sleuth

    A Snakemake workflow for differential expression analysis of RNA-seq data with Kallisto and Sleuth.

  • ZnTrack

    Create, visualize, run & benchmark DVC pipelines in Python & Jupyter notebooks.

    Project mention: What are some good examples of well-engineered pipelines | /r/ScientificComputing | 2023-04-05

    I expaned a bit on them with my own package https://zntrack.readthedocs.io/ - a general framework for building DVC pipelines through python scripts (and more). This finally brings me to the project I'm actually working on https://github.com/zincware/IPSuite which brings all of this together for the specific use case of machine learned interatomic potentials.

  • pi-ci

    Prepare Raspberry Pi 3, 4 & 5 configurations using a virtual machine.

  • SmartPipeline

    A framework for rapid development of robust data pipelines following a simple design pattern

    Project mention: Show HN: SmartPipeline, robust and light data pipelines in Python | news.ycombinator.com | 2023-05-03
  • memorization

    Code for "On Memorization in Probabilistic Deep Generative Models"

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-02-05.

Python Reproducibility related posts

Index

What are some of the best open-source Reproducibility projects in Python? This list will help you:

Project Stars
1 dvc 13,032
2 wandb 8,036
3 Sacred 4,151
4 lightning-hydra-template 3,611
5 catalyst 3,216
6 garage 1,806
7 EvalAI 1,656
8 benchmark_VAE 1,655
9 torch-fidelity 857
10 mach-nix 820
11 nn-template 604
12 framework-reproducibility 413
13 hydra-zen 269
14 example-get-started 167
15 signac 129
16 singularity-hpc 101
17 rna-seq-kallisto-sleuth 62
18 ZnTrack 41
19 pi-ci 37
20 SmartPipeline 21
21 memorization 5
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com