YOLOv5 on CPUs: Sparsifying to Achieve GPU-Level Performance

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • deepsparse

    Sparsity-aware deep learning inference runtime for CPUs

  • Disclosure: I work for Neural Magic.

    Hi deepnotderp, as noted by others the speeds listed here are combining throughput for GPU from Ultralytics to latency for GPU from Neural Magic. We did also include throughput measurements, though, where YOLOv5s was around 3 ms per image on a V100 at fp16 in our testing. All benchmarks were run on AWS instances for repeatability and availability and is likely where the 2 ms vs 3 ms discrepancy comes from (slower memory transfer on the AWS machine vs the one Ultralytics used). Note, though, a slower overall machine will also affect CPU results as well.

    We benchmarked using the available PyTorch APIs mimicking what was done for Ultralytics benchmarking. This code is open sourced for viewing and use here: https://github.com/neuralmagic/deepsparse/blob/main/examples...

  • yolov5

    YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

  • I'm pretty sure this isn't using the Tensor cores on the GPU.

    If you see here (https://github.com/ultralytics/yolov5/blob/master/README.md), the speed of inference on a V100 for YOLOv5s should be 2 ms per image, or 500 imgs/s, not the 44.6 img/s being reported here.

    This is important as it is more than an order of magnitude off.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts