StackStorm
Airflow
Our great sponsors
StackStorm | Airflow | |
---|---|---|
25 | 169 | |
5,870 | 33,953 | |
0.8% | 2.2% | |
9.5 | 10.0 | |
7 days ago | 7 days ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
StackStorm
- Ask HN: What are some unpopular technologies you wish people knew more about?
- StackStorm – IFTTT for Ops
-
Small app using a DB?
Stackstorm
-
We built Activepieces to replace Zapier + learnings from last post
What differentiates this from things like n8n, node red, and stackstorm? (which sort of occupy a zapier replacement, IoT automation, and infra automation niche, respectively)
- SRE: What tool do you use for Incident Response Runbook/Playbook
-
Hacker News top posts: Nov 24, 2022
StackStorm: Event-driven automation\ (17 comments)
- StackStorm (a.k.a. “IFTTT for Ops”) is event-driven automation
-
What Open Source Projects Do You Use In Your District?
StackStorm -- "IFTTT For Ops" I am investigating the different integrations to see if it can help automate some things.
-
free-for.dev
stackstorm.com — Event-driven automation for apps, services and workflows, free without flow, access control, LDAP,...
-
Event-driven Ansible looks awsome
Very cool! This reminds me of some of the event-driven automation tools like Stackstorm, letting you define triggers/events to subscribe to, and then take actions based on a defined rule set.
Airflow
-
Airflow VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
-
Simplifying Data Transformation in Redshift: An Approach with DBT and Airflow
Airflow is the most widely used and well-known tool for orchestrating data workflows. It allows for efficient pipeline construction, scheduling, and monitoring.
-
Ask HN: What is the correct way to deal with pipelines?
I agree there are many options in this space. Two others to consider:
- https://github.com/spotify/luigi
There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…
- Cómo construir tu propia data platform. From zero to hero.
-
Is it impossible to contribute to open source as a data engineer?
You can try and contribute some new connectors/operators for workflow managers like Airflow or Airbyte
-
Exploring MLOps Tools and Frameworks: Enhancing Machine Learning Operations
Apache Airflow:
-
Python task scheduler with a web UI
Looks interesting as a light-weight alternative to https://www.prefect.io/ (which itself is a lighter-weight / more modern alternative to https://airflow.apache.org/ ).
-
Working with Managed Workflows for Apache Airflow (MWAA) and Amazon Redshift
You can actually setup and delete new Redshift clusters using Apache Airflow. We can see in the example_dags of a DAG that does a complete setup and delete of a Redshift cluster. There are a few things to think about however.
-
.NET Modern Task Scheduler
A few years ago, I opened a GitHub issue with Microsoft telling them that I think the .NET ecosystem needs its own equivalent of Apache Airflow or Prefect. Fast forward 'til now, and I still don't think we have anything close to these frameworks.
-
How do you decide when to keep a project in a single python file vs break it up into multiple files?
Check out taskinstance.py in the Airflow project, it's a well targeted file, it has only one main class TaskInstance and a few small supporting classes and functions. It is ~3000 lines long: https://github.com/apache/airflow/blob/main/airflow/models/taskinstance.py
What are some alternatives?
Kedro - Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
dagster - An orchestration platform for the development, production, and observation of data assets.
Rundeck - Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts
n8n - Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
Dask - Parallel computing with task scheduling
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Apache Camel - Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing