Our great sponsors
-
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)
If you are working at an infrastructure level I would use ECS and utilize the NVIDIA Triton Inference Server. It can handle the multimodel paradigm through their ensemble method (bit of a misnomer since its really just a DAG of data flow through your models though you can add an ensembling method at the end of desired). Also provides a nice HTTP or GRPC interface. With ECS you can also use Application Load Balancer to further scale but how you set that up will also heavily depend on if you are using stateful models or not.
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.