Our great sponsors
-
luigi
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
At Wonderflow we're doing a lot of ML / NLP using Python and recently we are enjoying writing data pipelines using Spotify's Luigi.
We have tasks which actually require lots of different Spacy language models to be loaded at once, and we load them on many processes at once.
To do that, we are efficiently using Dask, simply creating on-demand local (or remote) clusters on task run() method:
Moreover, configure and deploy the Luigi's Scheduler on a server / pod for production use is easy, while it might be not for other similar tools like Apache AirFlow.
Related posts
- Distributed computing in python??
- Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions
- Navigating Week Two: Insights and Experiences from My Tublian Internship Journey
-
Airflow VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
- Best ETL Tools And Why To Choose