-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
AdaptDL monitors training job performance in real-time, and elastically re-scales resources (GPUs, compute instances) while jobs are running. For each training job, AdaptDL automatically tunes the batch size, learning rate, and gradient accumulation. In the cloud (e.g. AWS), AdaptDL can auto-scale the number of provisioned Spot Instances. We’ve seen shared-cluster training jobs at Petuum and our partners complete 2–3x faster on average, with 3x cheaper cost in AWS using Spot Instances!
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.
Related posts
-
[Discussion] Open source scheduler and queuing system for model training/inferencing tasks?
-
How we were able to achieve hyper-parameter tuning (HPT) for deep learning workflows at 1.5x faster in our clusters and 3x cheaper on AWS
-
[D] Anyone deploy DL models with AWS Lambda? Trying to estimate costs
-
SB-1047 will stifle open-source AI and decrease safety
-
Getting Started with Gemma Models