Data pipelines with Luigi

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • Sonar - Write Clean Python Code. Always.
  • ONLYOFFICE ONLYOFFICE Docs — document collaboration in your environment
  • InfluxDB - Access the most powerful time series database as a service
  • CodiumAI - TestGPT | Generating meaningful tests for busy devs
  • luigi

    Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

    At Wonderflow we're doing a lot of ML / NLP using Python and recently we are enjoying writing data pipelines using Spotify's Luigi.

  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    We have tasks which actually require lots of different Spacy language models to be loaded at once, and we load them on many processes at once.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • Dask

    Parallel computing with task scheduling

    To do that, we are efficiently using Dask, simply creating on-demand local (or remote) clusters on task run() method:

  • Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    Moreover, configure and deploy the Luigi's Scheduler on a server / pod for production use is easy, while it might be not for other similar tools like Apache AirFlow.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts