Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Disclosure: I work for Neural Magic.
Hi deepnotderp, as noted by others the speeds listed here are combining throughput for GPU from Ultralytics to latency for GPU from Neural Magic. We did also include throughput measurements, though, where YOLOv5s was around 3 ms per image on a V100 at fp16 in our testing. All benchmarks were run on AWS instances for repeatability and availability and is likely where the 2 ms vs 3 ms discrepancy comes from (slower memory transfer on the AWS machine vs the one Ultralytics used). Note, though, a slower overall machine will also affect CPU results as well.
We benchmarked using the available PyTorch APIs mimicking what was done for Ultralytics benchmarking. This code is open sourced for viewing and use here: https://github.com/neuralmagic/deepsparse/blob/main/examples...
I'm pretty sure this isn't using the Tensor cores on the GPU.
If you see here (https://github.com/ultralytics/yolov5/blob/master/README.md), the speed of inference on a V100 for YOLOv5s should be 2 ms per image, or 500 imgs/s, not the 44.6 img/s being reported here.
This is important as it is more than an order of magnitude off.
Related posts
- จำแนกสายพันธ์ุหมากับแมวง่ายๆด้วยYoLoV5
- [Tutorial] "Fine Tuning" Stable Diffusion using only 5 Images Using Textual Inversion.
- Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse
- How would i go about having YOLO v5 return me a list from left to right of all detected objects in an image?
- Building a Drowsiness Detection Web App from scratch - pt2