[D] How to be more productive while doing Deep Learning experiments?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

aim

70 4,762 7.9 Python

Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.

Log everything, literally everything, including hyperparameters, command-line arguments, environment variables, outputs, checkpoints, resource usage, etc. Decent High-level ML frameworks provide this out-of-the-box. Configure a callback to your trainer to send a notification through Slack. To track and compare your experiments use tools other than just a plain tensorboard. Aim is a fantastic tool to get insights from hundreds of experiments.

coddx-alpha

2 198 0.0 TypeScript

Todo Kanban Board manages tasks and save them as TODO.md - a simple plain text file.

Yes for deciding the order of experiments, I also like a Kanban board, like the other commenter suggested. There is a VSCode plugin that displays the content of a TODO.md as kanban board: https://github.com/coddx-hq/coddx-alpha

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
guildai

16 856 8.8 Python

Experiment tracking, ML developer tools

There are a number of experiment tracking systems out there. mlflow, wandb, Guild AI, etc. (disclaimer I developed Guild). I would look at adopting one of those. While you can roll your own experiment tracking tool, there's just no point IMO.

detectron2

49 28,671 7.5 Python

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

http://karpathy.github.io/2019/04/25/recipe/ I sense that your experiments are not very organised. I would recommend using a configuration approach, where each experiment can be described by config such as https://github.com/facebookresearch/detectron2/blob/master/detectron2/config/config.py, see https://github.com/facebookresearch/detectron2/tree/master/configs for example of usage. Most experiments should only require changing parameters in main config. For experiments that require code changes, use git branches to try and if they are successful implement them as config keys.

Sacred

6 4,155 3.5 Python

Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.

For 1, setup an experiment tracking framework. I found Sacred to be helpful https://github.com/IDSIA/sacred.

metaflow

24 7,559 9.2 Python

:rocket: Build and manage real-life ML, AI, and data science projects with ease!

For building experiments as a DAG, I suggest Metaflow from Netflix. I like the ability to resume if I make a mistake. Make sure you tag your runs so you can always filter runs that had a flaw in them.

nvidia-gpu-scheduler

1 7 0.0 Python

NVIDIA GPU compute task scheduling utility

Sure. No, a simple bash script is not enough. In my case, we have several machines shared in the department, some with GPUs, some without. What I have is a python script that gets a list of jobs and then it schedule them in the first available machine (according to memory/CPU/GPU availability). Unfortunately, what I have is really entangled with our computing platform (Docker-based with a shared filesystem) and not really easy to have it as standalone project (that's why I said "know you infrastructure"). The most similar thing that I could find online is this project. I believe there are then some HPC tools that could be useful (e.g. Slurm), but that's way too much for what we need.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
tmux

207 32,923 8.3 C

tmux source code

Try to avoid jupyter notebooks, use them only for very preliminary experiments to save time... But for the long-run, use decent IDEs (vscode, PyCharm) can easily help you to stay away from stupid bugs. PyCharm has stunning Python language support, while open-source VSCode, Insiders Channel makes it very easy to code, run and debug remotely. Use Mosh or Eternal Terminal to prevent disconnection even if your computer is asleep/disconnected from the internet, use tmux to run tasks when you're away. You can use your smartphone to always stay connected to the same tmux session and monitor the training.

pytorch-lightning

19 19,188 9.9 Python

Discontinued Build high-performance AI models with PyTorch Lightning (organized PyTorch). Deploy models with Lightning Apps (organized Python to build end-to-end ML systems). [Moved to: https://github.com/Lightning-AI/lightning] (by PyTorchLightning)

First of all, use high-level ML frameworks (AllenNLP, PyTorch-Lightning). No need to write boilerplate code and implement standard ML approaches from scratch. Here are some suggestions (thought more NLP-focused) that I feel improved my research coding experience a lot.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project