Ask HN: What is the correct way to deal with pipelines?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Huginn

    Create agents that monitor and act on your behalf. Your agents are standing by!

  • "correct" is a value judgement that depends on lots of different things. Only you can decide which tool is correct. Here are some ideas:

    - https://camel.apache.org/

    - https://www.windmill.dev/

    - https://github.com/huginn/huginn

    Your idea about a queue (in redis, or postgres, or sqlite, etc) is also totally valid. These off-the-shelf tools I listed probably wouldn't give you a huge advantage IMO.

  • Apache Camel

    Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.

  • "correct" is a value judgement that depends on lots of different things. Only you can decide which tool is correct. Here are some ideas:

    - https://camel.apache.org/

    - https://www.windmill.dev/

    - https://github.com/huginn/huginn

    Your idea about a queue (in redis, or postgres, or sqlite, etc) is also totally valid. These off-the-shelf tools I listed probably wouldn't give you a huge advantage IMO.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

  • I agree there are many options in this space. Two others to consider:

    - https://airflow.apache.org/

    - https://github.com/spotify/luigi

    There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…

  • luigi

    Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

  • I agree there are many options in this space. Two others to consider:

    - https://airflow.apache.org/

    - https://github.com/spotify/luigi

    There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts