Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Multiprocessing works well but you probably need an abstraction on top to make it work reliably. For starters, it's best to use a pool of processes because creating new ones is expensive, you also need to ensure that errors in the sub-processes are correctly displayed in the main process, otherwise, it becomes frustrating. Also, sometimes sub-processings might get stuck so you have to monitor them. I implemented something that takes care of all of that for a project I'm working on, it'll give you an idea of what it looks like (of course, you can use the framework as well, which lets you parallelize functions and notebooks).
Finally, debugging. If you're running code in sub-processes; debugging becomes a real pain because out of the box, you won't be able to start a debugger in the sub-processes. Furthermore, there's a chance that more than one fails. One solution is to dump the traceback when any sub-process fails, so you can start a debugging sesstion afterward; look at this project for an example.
https://github.com/Slimmer-AI/mpire is a nice lib, with better performance than multiprocessing.
We automatically provide container level parallelism in Orchest: https://github.com/orchest/orchest
Related posts
- Decent low code options for orchestration and building data flows?
- Build ML workflows with Jupyter notebooks
- Building container images in Kubernetes, how would you approach it?
- Ideas for infrastructure and tooling to use for frequent model retraining?
- Looking for a mentor in MLOps. I am a lead developer.