Top 12 Python Reproducibility Projects
🦉Data Version Control | Git for Data & Models | ML Experiments ManagementProject mention: Git-annex – Managing large files with Git | news.ycombinator.com | 2022-01-15
Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.Project mention: https://np.reddit.com/r/MachineLearning/comments/pvs8r5/d_facebook_visdom_vs_google_tensorboard_for/hefg131/ | reddit.com/r/backtickbot | 2021-09-26
I'm using Omniboard (https://github.com/vivekratnavel/omniboard) with Sacred (https://github.com/IDSIA/sacred) for tracking experiments. You can specify custom Observers in Sacred so the model metrics and logs will be saved to a local directory or to a remote DB (e.g., MongoDB). I use a MongoDB database hosted on Atlas. Unlike other suggested options, Sacred and Omniboard are free. Atlas free tier comes with 512MB of free storage which is a huge amount if you're uploading only log files to it.
OPS - Build and Run Open Source Unikernels. Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.
A toolkit for reproducible reinforcement learning research.Project mention: Which python library to pick for RL as a beginner | reddit.com/r/reinforcementlearning | 2021-04-27
:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
PyTorch Lightning + Hydra. A feature-rich template for rapid, scalable and reproducible ML experimentation with best practices. ⚡🔥⚡Project mention: Our template to kickstart your pytorch projects, with list of best practices. Minimal boilerplate code. Leverages Lightning + Hydra. Focused on scalability, reproducibility and fast experimentation. | reddit.com/r/pytorch | 2021-05-03
and many more! (checkout the #Your Superpowers section of the readme)
Collective Knowledge framework (CK) provides a common set of automation recipes, APIs and meta descriptions to enable collaborative, reproducible and unified benchmarking and optimization of ML Systems across continuously changing models, data sets, software and hardware: (by mlcommons)
Create highly reproducible python environmentsProject mention: Install a Python package on NixOS but it is not found. | reddit.com/r/NixOS | 2021-11-18
Another option (especially with packages not available in nixpkgs) is to use mach-nix. Also take a look at nix.dev.
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, DVC, and Streamlit.Project mention: MLOps stack using Pycaret | reddit.com/r/mlops | 2022-01-17
I would pick a project that interests you as that'll help you power through. If there's nothing that comes to mind, image classification is fairly standard. PyCaret does a lot but if you want to understand each of the tools you've listed, I'd recommend tackling each one separately. That being said, I don't think there's anything wrong starting with using a high level library and diving deeper as the need arises. If you do decide to build it piece by piece, it sometimes useful to have a library that'll help you start and remove the boilerplate of all of these tools. I came across a template repo which has a bunch of the tools you've listed which could be a good starting point: https://github.com/lucmos/nn-template
Manage large and heterogeneous data spaces on the file system.Project mention: Do you use any data-tracking/automation software within your work/research? | reddit.com/r/comp_chem | 2021-08-10
Get started DVC projectProject mention: Tuning Hyperparameters with Reproducible Experiments | dev.to | 2021-08-03
We're going to be working with an existing NLP project. You can get the code we're working with in this repo. It already has DVC set up, but you can check out the Get Started docs if you want to know how the DVC pipeline was created.
Create, visualize, run & benchmark DVC pipelines in Python & Jupyter notebooks.Project mention: HPC Rocket - A tool to run Slurm jobs from CI pipelines | reddit.com/r/Python | 2022-01-03
This looks really interesting! I have a similar scenario but haven't looked into it yet. Have you looked at dvc.org - I'm planning on using it together with slurm and what they call CML for my projects. On that context I also wrote a tool that makes DVC more pythonic https://github.com/zincware/ZnTrack altough I'm currently restructuring it a bit but having backwards compatibility in mind.
Prepare Raspberry Pi 3 & 4 configurations using a virtual machine.Project mention: Can I run and setup raspberry pi OS + chat server completely virtually from my host for testing so I can then easily just transfer it to physical pi afterwards? | reddit.com/r/linuxquestions | 2021-11-05
Just found this which seems to cover just what I was after? The user there noted that internet connection is not provided by pi mostly and I want to test the chat server first so would want that. That link says it has internet provided and ready to go via docker so will that do the job?
Python Reproducibility related posts
Git-annex – Managing large files with Git
2 projects | news.ycombinator.com | 15 Jan 2022
IPFS for a shitty cause
3 projects | reddit.com/r/DataHoarder | 6 Jan 2022
HPC Rocket - A tool to run Slurm jobs from CI pipelines
4 projects | reddit.com/r/Python | 3 Jan 2022
Running Collaborative Machine Learning Experiments with DVC and Git - Tutorial
1 project | reddit.com/r/GitOps | 13 Dec 2021
Don't Just Track Your ML Experiments, Version Them - Managing Machine Learning Experiments as Code with Git and DVC Open Source Tools
1 project | reddit.com/r/opensource | 10 Dec 2021
Managing Your Machine Learning Experiments as Code with Git and DVC
1 project | reddit.com/r/github | 9 Dec 2021
DVC (DataVersionControl) - Managing Machine Learning Experiments as Code with Git and DVC
1 project | reddit.com/r/githubprojects | 9 Dec 2021
What are some of the best open-source Reproducibility projects in Python? This list will help you:
Are you hiring? Post a new remote job listing for free.