How do you deal with parallelising parts of an ML pipeline especially on Python?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

ploomber

121 3,369 7.8 Python

The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️

Multiprocessing works well but you probably need an abstraction on top to make it work reliably. For starters, it's best to use a pool of processes because creating new ones is expensive, you also need to ensure that errors in the sub-processes are correctly displayed in the main process, otherwise, it becomes frustrating. Also, sometimes sub-processings might get stuck so you have to monitor them. I implemented something that takes care of all of that for a project I'm working on, it'll give you an idea of what it looks like (of course, you can use the framework as well, which lets you parallelize functions and notebooks).

debuglater

8 51 3.8 Python

Store Python traceback for later debugging. 🐛

Finally, debugging. If you're running code in sub-processes; debugging becomes a real pain because out of the box, you won't be able to start a debugger in the sub-processes. Furthermore, there's a chance that more than one fails. One solution is to dump the traceback when any sub-process fails, so you can start a debugging sesstion afterward; look at this project for an example.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
mpire

8 1,898 7.6 Python

A Python package for easy multiprocessing, but faster than multiprocessing

https://github.com/Slimmer-AI/mpire is a nice lib, with better performance than multiprocessing.

orchest

44 4,020 4.5 TypeScript

Build data pipelines, the easy way 🛠️

We automatically provide container level parallelism in Orchest: https://github.com/orchest/orchest

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project