Python datacenter Projects
-
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)
Project mention: Everything you need to know about Python 3.13 – JIT and GIL went up the hill | news.ycombinator.com | 2024-09-28As always, it depends a lot on what you're doing, and a lot of people are using Python for AI.
One of the drawbacks of multi-processing versus multi-threading is that you cannot share memory (easily, cheaply) between processes. During model training, and even during inference, this becomes a problem.
For example, imagine a high volume, low latency, synchronous computer vision inference service. If you're handling each request in a different process, then you're going to have to jump through a bunch of hoops to make this performant. For example, you'll need to use shared memory to move data around, because images are large, and sockets are slow. Another issue is that each process will need a different copy of the model in GPU memory, which is a problem in a world where GPU memory is at a premium. You could of course have a single process for the GPU processing part of your model, and then automatically batch inputs into this process, etc. etc. (and people do) but all this is just to work around the lack of proper threading support in Python.
By the way, if anyone is struggling with these challenges today, I recommend taking a peek at nvidia's [triton](https://github.com/triton-inference-server/server) inference server, which handles a lot of these details for you. It supports things like zero-copy sharing of tensors between parts of your model running in different processes/threads and does auto-batching between requests as well. Especially auto-batching gave us big throughput increase with a minor latency penalty!
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-