-
DeepLearningExamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Disclosure: I used to work on Google Cloud.
I dunno, their A100 results took about 20-30 minutes on 8 x A100s [1]. 8xA100s is like $24/hr on GCP at on-demand rates.
The efficiency was okay but not linear, so if you were more cost constrained you might go with 1xA100 for $3/hr and have ~2.5hr training times.
Getting that performance out of a GPU is more challenging than getting access to the GPUs. All the major cloud providers offer them.
(Nit: GCP deployed the 40 GiB cards rather than the later 80 GiB parts, but let's ignore that).
but it often doesn't matter
[1] https://github.com/NVIDIA/DeepLearningExamples/tree/master/P...