Our great sponsors
-
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)
If you are working at an infrastructure level I would use ECS and utilize the NVIDIA Triton Inference Server. It can handle the multimodel paradigm through their ensemble method (bit of a misnomer since its really just a DAG of data flow through your models though you can add an ensembling method at the end of desired). Also provides a nice HTTP or GRPC interface. With ECS you can also use Application Load Balancer to further scale but how you set that up will also heavily depend on if you are using stateful models or not.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.