Airflow
Cronicle
Our great sponsors
Airflow | Cronicle | |
---|---|---|
169 | 22 | |
34,099 | 3,207 | |
2.2% | - | |
10.0 | 7.5 | |
about 15 hours ago | 10 days ago | |
Python | JavaScript | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Airflow
-
Airflow VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
-
Simplifying Data Transformation in Redshift: An Approach with DBT and Airflow
Airflow is the most widely used and well-known tool for orchestrating data workflows. It allows for efficient pipeline construction, scheduling, and monitoring.
-
Ask HN: What is the correct way to deal with pipelines?
I agree there are many options in this space. Two others to consider:
- https://github.com/spotify/luigi
There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…
- Cómo construir tu propia data platform. From zero to hero.
-
Is it impossible to contribute to open source as a data engineer?
You can try and contribute some new connectors/operators for workflow managers like Airflow or Airbyte
-
Exploring MLOps Tools and Frameworks: Enhancing Machine Learning Operations
Apache Airflow:
-
Python task scheduler with a web UI
Looks interesting as a light-weight alternative to https://www.prefect.io/ (which itself is a lighter-weight / more modern alternative to https://airflow.apache.org/ ).
-
Working with Managed Workflows for Apache Airflow (MWAA) and Amazon Redshift
You can actually setup and delete new Redshift clusters using Apache Airflow. We can see in the example_dags of a DAG that does a complete setup and delete of a Redshift cluster. There are a few things to think about however.
-
.NET Modern Task Scheduler
A few years ago, I opened a GitHub issue with Microsoft telling them that I think the .NET ecosystem needs its own equivalent of Apache Airflow or Prefect. Fast forward 'til now, and I still don't think we have anything close to these frameworks.
-
How do you decide when to keep a project in a single python file vs break it up into multiple files?
Check out taskinstance.py in the Airflow project, it's a well targeted file, it has only one main class TaskInstance and a few small supporting classes and functions. It is ~3000 lines long: https://github.com/apache/airflow/blob/main/airflow/models/taskinstance.py
Cronicle
-
Executing Cron Scripts Reliably at Scale
Wasn't it simpler to use Cronicle (https://github.com/jhuckaby/Cronicle)?
-
Is there a Docker container, or self-hosted app to create and monitor cron jobs?
You can give cronicle a try. It has a web based UI and sone good stats.
- Cronjobs UI Service / CLI
-
Good Cron GUI
Have a look at Cronicle ( https://github.com/jhuckaby/Cronicle )
-
How to setup a containerized python environment? Function as a Service or an alternative solution for a Python execution environment.
Firstly, I tried Rundeck and Apache Airflow. They are complete overkill for what I want to do. Then I found Cronicle which is light enough, besides it can pull double duty as a general purpose scheduler.
-
Selfhosted CRON Server + Webapp
Also check out http://cronicle.net/
-
Centralised web GUI for task scheduling?
Thought I'd post here before taking a dive into this and see if anyone has any practical experience. I'm looking for centralising scheduled tasks for multiple servers, preferably with a management GUI for friendliness. I found Crontab-UI which seems to only interact with the single host's crontab. Then I stumbled upon Cronicle which looks feature rich but looks like multi-server is handled by deploying the GUI to each host. A central server + agents on each host would be nicer. I'm wondering if anyone uses Cronicle, or another solution? Thanks!
-
What are your Most Used Self Hosted Applications?
Cronicle (best graphical cron-like task scheduler I've ever used, free or paid)
- Anything better than cronjobs? (or jenkins, systemd)
-
My extended version of Codeserver - a browser-based VS-code
Codeserver - is VS-code in browser. In this docker image https://github.com/bluxmit/alnoda-workspaces/tree/main/workspaces/codeserver-workspace I added to the Codeserver a full-screen browser-based terminal, Cronicle - a visual scheduler, and Filebrowser.
What are some alternatives?
Kedro - Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
dagster - An orchestration platform for the development, production, and observation of data assets.
n8n - Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
Dask - Parallel computing with task scheduling
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Apache Camel - Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
argo - Workflow Engine for Kubernetes
node-cron - Cron for NodeJS.