Python Reproducibility

Open-source Python projects categorized as Reproducibility | Edit details

Top 12 Python Reproducibility Projects

  • GitHub repo dvc

    🦉Data Version Control | Git for Data & Models | ML Experiments Management

    Project mention: Git-annex – Managing large files with Git | news.ycombinator.com | 2022-01-15
  • GitHub repo Sacred

    Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.

    Project mention: https://np.reddit.com/r/MachineLearning/comments/pvs8r5/d_facebook_visdom_vs_google_tensorboard_for/hefg131/ | reddit.com/r/backtickbot | 2021-09-26

    I'm using Omniboard (https://github.com/vivekratnavel/omniboard) with Sacred (https://github.com/IDSIA/sacred) for tracking experiments. You can specify custom Observers in Sacred so the model metrics and logs will be saved to a local directory or to a remote DB (e.g., MongoDB). I use a MongoDB database hosted on Atlas. Unlike other suggested options, Sacred and Omniboard are free. Atlas free tier comes with 512MB of free storage which is a huge amount if you're uploading only log files to it.

  • OPS

    OPS - Build and Run Open Source Unikernels. Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.

  • GitHub repo garage

    A toolkit for reproducible reinforcement learning research.

    Project mention: Which python library to pick for RL as a beginner | reddit.com/r/reinforcementlearning | 2021-04-27
  • GitHub repo EvalAI

    :cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI

    Project mention: EvalAI: An Open-Source Alternative to Kaggle | news.ycombinator.com | 2021-06-24

    I agree that the comparison to Kaggle is a bit old and we have removed it (https://github.com/Cloud-CV/EvalAI/pull/3502). :-)

  • GitHub repo lightning-hydra-template

    PyTorch Lightning + Hydra. A feature-rich template for rapid, scalable and reproducible ML experimentation with best practices. ⚡🔥⚡

    Project mention: Our template to kickstart your pytorch projects, with list of best practices. Minimal boilerplate code. Leverages Lightning + Hydra. Focused on scalability, reproducibility and fast experimentation. | reddit.com/r/pytorch | 2021-05-03

    and many more! (checkout the #Your Superpowers section of the readme)

  • GitHub repo ck

    Collective Knowledge framework (CK) provides a common set of automation recipes, APIs and meta descriptions to enable collaborative, reproducible and unified benchmarking and optimization of ML Systems across continuously changing models, data sets, software and hardware: (by mlcommons)

    Project mention: Research software code is likely to remain a tangled mess | news.ycombinator.com | 2021-02-22

    – Their solution product https://cknowledge.io/ and source code https://github.com/ctuning/ck\

    I guess it should be helpful to the researchers community.

  • GitHub repo mach-nix

    Create highly reproducible python environments

    Project mention: Install a Python package on NixOS but it is not found. | reddit.com/r/NixOS | 2021-11-18

    Another option (especially with packages not available in nixpkgs) is to use mach-nix. Also take a look at nix.dev.

  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • GitHub repo nn-template

    Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, DVC, and Streamlit.

    Project mention: MLOps stack using Pycaret | reddit.com/r/mlops | 2022-01-17

    I would pick a project that interests you as that'll help you power through. If there's nothing that comes to mind, image classification is fairly standard. PyCaret does a lot but if you want to understand each of the tools you've listed, I'd recommend tackling each one separately. That being said, I don't think there's anything wrong starting with using a high level library and diving deeper as the need arises. If you do decide to build it piece by piece, it sometimes useful to have a library that'll help you start and remove the boilerplate of all of these tools. I came across a template repo which has a bunch of the tools you've listed which could be a good starting point: https://github.com/lucmos/nn-template

  • GitHub repo signac

    Manage large and heterogeneous data spaces on the file system.

    Project mention: Do you use any data-tracking/automation software within your work/research? | reddit.com/r/comp_chem | 2021-08-10
  • GitHub repo example-get-started

    Get started DVC project

    Project mention: Tuning Hyperparameters with Reproducible Experiments | dev.to | 2021-08-03

    We're going to be working with an existing NLP project. You can get the code we're working with in this repo. It already has DVC set up, but you can check out the Get Started docs if you want to know how the DVC pipeline was created.

  • GitHub repo ZnTrack

    Create, visualize, run & benchmark DVC pipelines in Python & Jupyter notebooks.

    Project mention: HPC Rocket - A tool to run Slurm jobs from CI pipelines | reddit.com/r/Python | 2022-01-03

    This looks really interesting! I have a similar scenario but haven't looked into it yet. Have you looked at dvc.org - I'm planning on using it together with slurm and what they call CML for my projects. On that context I also wrote a tool that makes DVC more pythonic https://github.com/zincware/ZnTrack altough I'm currently restructuring it a bit but having backwards compatibility in mind.

  • GitHub repo pi-ci

    Prepare Raspberry Pi 3 & 4 configurations using a virtual machine.

    Project mention: Can I run and setup raspberry pi OS + chat server completely virtually from my host for testing so I can then easily just transfer it to physical pi afterwards? | reddit.com/r/linuxquestions | 2021-11-05

    Just found this which seems to cover just what I was after? The user there noted that internet connection is not provided by pi mostly and I want to test the chat server first so would want that. That link says it has internet provided and ready to go via docker so will that do the job?

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-01-17.

Python Reproducibility related posts

Index

What are some of the best open-source Reproducibility projects in Python? This list will help you:

Project Stars
1 dvc 9,134
2 Sacred 3,695
3 garage 1,375
4 EvalAI 1,296
5 lightning-hydra-template 909
6 ck 452
7 mach-nix 428
8 nn-template 334
9 signac 84
10 example-get-started 76
11 ZnTrack 17
12 pi-ci 10
Find remote jobs at our new job board 99remotejobs.com. There are 29 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Less time debugging, more time building
Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
scoutapm.com