[D] patterns for scaling video inference service?

This page summarizes the projects mentioned and recommended in the original post on reddit.com/r/MachineLearning

Our great sponsors
  • SonarQube - Static code analysis for 29 languages.
  • Scout APM - Less time debugging, more time building
  • SaaSHub - Software Alternatives and Reviews
  • server

    The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)

    If you are working at an infrastructure level I would use ECS and utilize the NVIDIA Triton Inference Server. It can handle the multimodel paradigm through their ensemble method (bit of a misnomer since its really just a DAG of data flow through your models though you can add an ensembling method at the end of desired). Also provides a nice HTTP or GRPC interface. With ECS you can also use Application Load Balancer to further scale but how you set that up will also heavily depend on if you are using stateful models or not.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts