[D] How to be more productive while doing Deep Learning experiments?

This page summarizes the projects mentioned and recommended in the original post on reddit.com/r/MachineLearning

Our great sponsors
  • SonarQube - Static code analysis for 29 languages.
  • Scout APM - Less time debugging, more time building
  • OPS - Build and Run Open Source Unikernels
  • GitHub repo aim

    Aim — an easy-to-use and performant open-source experiment tracker.

    Log everything, literally everything, including hyperparameters, command-line arguments, environment variables, outputs, checkpoints, resource usage, etc. Decent High-level ML frameworks provide this out-of-the-box. Configure a callback to your trainer to send a notification through Slack. To track and compare your experiments use tools other than just a plain tensorboard. Aim is a fantastic tool to get insights from hundreds of experiments.

  • GitHub repo coddx-alpha

    Todo Kanban Board manages tasks and save them as TODO.md - a simple plain text file.

    Yes for deciding the order of experiments, I also like a Kanban board, like the other commenter suggested. There is a VSCode plugin that displays the content of a TODO.md as kanban board: https://github.com/coddx-hq/coddx-alpha

  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • GitHub repo guildai

    Experiment tracking, ML developer tools

    There are a number of experiment tracking systems out there. mlflow, wandb, Guild AI, etc. (disclaimer I developed Guild). I would look at adopting one of those. While you can roll your own experiment tracking tool, there's just no point IMO.

  • GitHub repo detectron2

    Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

    http://karpathy.github.io/2019/04/25/recipe/ I sense that your experiments are not very organised. I would recommend using a configuration approach, where each experiment can be described by config such as https://github.com/facebookresearch/detectron2/blob/master/detectron2/config/config.py, see https://github.com/facebookresearch/detectron2/tree/master/configs for example of usage. Most experiments should only require changing parameters in main config. For experiments that require code changes, use git branches to try and if they are successful implement them as config keys.

  • GitHub repo Sacred

    Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.

    For 1, setup an experiment tracking framework. I found Sacred to be helpful https://github.com/IDSIA/sacred.

  • GitHub repo metaflow

    :rocket: Build and manage real-life data science projects with ease!

    For building experiments as a DAG, I suggest Metaflow from Netflix. I like the ability to resume if I make a mistake. Make sure you tag your runs so you can always filter runs that had a flaw in them.

  • GitHub repo nvidia-gpu-scheduler

    NVIDIA GPU compute task scheduling utility

    Sure. No, a simple bash script is not enough. In my case, we have several machines shared in the department, some with GPUs, some without. What I have is a python script that gets a list of jobs and then it schedule them in the first available machine (according to memory/CPU/GPU availability). Unfortunately, what I have is really entangled with our computing platform (Docker-based with a shared filesystem) and not really easy to have it as standalone project (that's why I said "know you infrastructure"). The most similar thing that I could find online is this project. I believe there are then some HPC tools that could be useful (e.g. Slurm), but that's way too much for what we need.

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • GitHub repo tmux

    tmux source code

    Try to avoid jupyter notebooks, use them only for very preliminary experiments to save time... But for the long-run, use decent IDEs (vscode, PyCharm) can easily help you to stay away from stupid bugs. PyCharm has stunning Python language support, while open-source VSCode, Insiders Channel makes it very easy to code, run and debug remotely. Use Mosh or Eternal Terminal to prevent disconnection even if your computer is asleep/disconnected from the internet, use tmux to run tasks when you're away. You can use your smartphone to always stay connected to the same tmux session and monitor the training.

  • GitHub repo pytorch-lightning

    The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

    First of all, use high-level ML frameworks (AllenNLP, PyTorch-Lightning). No need to write boilerplate code and implement standard ML approaches from scratch. Here are some suggestions (thought more NLP-focused) that I feel improved my research coding experience a lot.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts