Strange results from training with Google Cloud TPUs, seem to be very inefficient?

This page summarizes the projects mentioned and recommended in the original post on /r/learnmachinelearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • seed_rl

    Discontinued SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference. Implements IMPALA and R2D2 algorithms in TF2 with SEED's architecture.

  • I've been doing some tests to find the most efficient configuration for training using Google Cloud AI Platform. The results are here (note that "step" in this case represents a single sample/observation/frame from a single environment; iteration represents running the minimization function on a single batch). The results are a bit strange. I was under the assumption that training with TPUs would be one of the most efficient ways to train, but instead it's the least efficient by a wide margin. I'm using Google Research's SEED RL codebase, so I'm assuming there are no bugs in my code.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • [Q]Official seed_rl repo is archived.. any alternative seed_rl style drl repo??

    1 project | /r/reinforcementlearning | 17 Dec 2022
  • Need some help understanding what steps to take to debug a RL agent

    1 project | /r/learnmachinelearning | 17 Jul 2021
  • Strange training results: why is a batch size of 1 more efficient than larger batch sizes, despite using a GPU/TPU?

    1 project | /r/learnmachinelearning | 14 Jul 2021
  • Having trouble passing custom flags with AI Platform

    1 project | /r/googlecloud | 29 Jun 2021
  • New to Linux, trying to understand why a variable isn't getting assigned in an .sh file

    1 project | /r/linuxquestions | 20 Jun 2021