YOLOv5 on CPUs: Sparsifying to Achieve GPU-Level Performance

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

deepsparse

21 2,873 9.5 Python

Sparsity-aware deep learning inference runtime for CPUs

Disclosure: I work for Neural Magic.
Hi deepnotderp, as noted by others the speeds listed here are combining throughput for GPU from Ultralytics to latency for GPU from Neural Magic. We did also include throughput measurements, though, where YOLOv5s was around 3 ms per image on a V100 at fp16 in our testing. All benchmarks were run on AWS instances for repeatability and availability and is likely where the 2 ms vs 3 ms discrepancy comes from (slower memory transfer on the AWS machine vs the one Ultralytics used). Note, though, a slower overall machine will also affect CPU results as well.
We benchmarked using the available PyTorch APIs mimicking what was done for Ultralytics benchmarking. This code is open sourced for viewing and use here: https://github.com/neuralmagic/deepsparse/blob/main/examples...

yolov5

129 46,921 8.8 Python

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

I'm pretty sure this isn't using the Tensor cores on the GPU.
If you see here (https://github.com/ultralytics/yolov5/blob/master/README.md), the speed of inference on a V100 for YOLOv5s should be 2 ms per image, or 500 imgs/s, not the 44.6 img/s being reported here.
This is important as it is more than an order of magnitude off.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project