Our great sponsors
- Sonar - Write Clean Python Code. Always.
- ONLYOFFICE ONLYOFFICE Docs — document collaboration in your environment
- InfluxDB - Access the most powerful time series database as a service
- CodiumAI - TestGPT | Generating meaningful tests for busy devs
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
At Wonderflow we're doing a lot of ML / NLP using Python and recently we are enjoying writing data pipelines using Spotify's Luigi.
💫 Industrial-strength Natural Language Processing (NLP) in Python
We have tasks which actually require lots of different Spacy language models to be loaded at once, and we load them on many processes at once.
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
Parallel computing with task scheduling
To do that, we are efficiently using Dask, simply creating on-demand local (or remote) clusters on task run() method:
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Moreover, configure and deploy the Luigi's Scheduler on a server / pod for production use is easy, while it might be not for other similar tools like Apache AirFlow.
Distributed computing in python??
2 projects | reddit.com/r/learnpython | 9 Nov 2021
Unable to login into airflow webserver account
1 project | reddit.com/r/apache_airflow | 18 May 2023
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
1 project | reddit.com/r/software | 13 May 2023
We all have tough days at the office... so why can't data logos? :)
1 project | reddit.com/r/dataengineering | 27 Apr 2023
My experience with optimizing machine learning workflow
1 project | reddit.com/r/learnprogramming | 21 Apr 2023