Running Jupyter notebooks in parallel

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

african_microbiome_portal_data

1 0 4.3 Jupyter Notebook

Raw and corrected data with correction python notebook

Here we will share the results after testing and evaluating some of these tools. Note that to make this comparison fair, it takes into account the use of the same code for all executions and we also use Python's time module to measure the execution time. The notebooks used for benchmarking can be found here and correspond to the african_microbiome_portal_data repository. Serial execution cases (each notebook sequentially) are evaluated first, followed by parallel notebook execution cases.

papermill

26 5,623 7.9 Python

📚 Parameterize, execute, and analyze notebooks

As a first option, we will use Papermill, which has a Python API that allows us to run different notebooks using some functions:

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
ploomber

121 3,369 7.8 Python

The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️

As a second option, we will use Ploomber with serial execution, which also has a Python API that allows us to execute different notebooks using the NotebookRunner function:

ploomber-engine

2 59 7.0 Python

A toolbox 🧰 for Jupyter notebooks 📙: testing, experiment tracking, debugging, profiling, and more!

As a third option we will use Papermill again, but now with the ploomber-engine, which adds debugging and profiling features to Papermill:

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project