workflows-samples
Airflow
workflows-samples | Airflow | |
---|---|---|
14 | 170 | |
67 | 34,627 | |
- | 1.5% | |
6.0 | 10.0 | |
6 days ago | 6 days ago | |
Shell | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
workflows-samples
-
Event driven architects: how to handle event state through multiple services?
Have you seen this? https://cloud.google.com/workflows
-
Job Scheduling on Google Cloud Platform
Cloud Workflows: A serverless workflow orchestration service
-
Trigger Cloud Run job execution
Workflows can trigger actions, like Cloud Run Jobs, in a sequence of steps. The Workflows product waits for the job to complete, fail, or time out before it moves on to the next step. It uses polling to check on the job, so there may be a delay between the job finishing and the next step.
-
GCP Workflows
Has anyone integrated firestore/realtime database together with GCP workflows? What was your use case? How was your experience with it? Why have you decided to go that way?
-
Handy Yaml Tricks!
In the past few years, YAML (http://yaml.org) has become an essential part of software, particularly for infrastructure-as-code tools. Yaml at the heart of kubernetes configuration, kubernetes-inspired APIs like Google's config connector, and a number of workflow systems like Google Cloud Workflows and Github Actions.
-
Newbie to Google Cloud, but I was wondering if there was a way to set up a routine to run a code snippet daily?
If your routine is just a bunch of API calls, you can also replace steps 1-2 with Workflows.
-
Kubernetes Reinvented Virtual Machines (in a good sense)
I have come at this problem from a bit of a different angle by asking what is the closest I can possibly get to the hypothetical dream state of everything is automated, autoscaling blah blah blah as possible with the absolute smallest budget in terms of not only actual costs but time budget as well.
I only know the GCP ecosystem kind of well so I don't fully know to what extent these things exist in AWS and Azure but there I think there is a really nice path you can get on with the serverless route that skips K8s entirely but keeps you very well aligned in case you ever need to "upgrade" or get out of the GCP ecosystem.
I write very stock standard gRPC services and then put them onto Cloud Run (which has a very Heroku like workflow) and stick https://cloud.google.com/api-gateway in front of things and now my API is running on the exact same setup as any other service Google is running in production. Huge amounts of logic get moved out of my code base as a result.
If you are also willing to write your APIs a fairly particular way https://google.aip.dev/ it starts to become trivial to integrate other things like https://cloud.google.com/workflows, https://cloud.google.com/pubsub and https://cloud.google.com/tasks which is traditionally where a lot of the "state" and weirdly complicated logic previously lived in my code. I'm now not really writing any of that.
Now it's all declarative where I just say what I want to happen and I don't have to think about much else beyond that because it too is using that same internal GCP infrastructure to handle all the complicated parts around what to do when things go wrong.
But to me they are all extremely heavily aligned with the K8s path so the lock in certainly doesn't feel as scary.
-
A Brief Comparison of Apache DolphinScheduler With Other Alternatives
Google Workflows combines Google’s cloud services and APIs to help developers build reliable large-scale applications, process automation, and deploy machine learning and data pipelines.
-
Associate with parent Cloud Workflows logs and child APIs logs using structured logs
Lately, I build a system using Cloud Workflows which can combine Google Cloud Services such as Cloud Functions and Cloud Run. Sometimes, I was in a situation where I want to examine more efficiently using logs on Cloud Logging when debugging or daily monitoring.
-
My GCP feature requests for 2022
Look at Cloud Workflows for simple workflows
Airflow
-
Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions
Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.
-
Navigating Week Two: Insights and Experiences from My Tublian Internship Journey
In week Two, I contributed to the Apache Airflow repository.
-
Airflow VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
-
Best ETL Tools And Why To Choose
Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. The platform features a web-based user interface and a command-line interface for managing and triggering workflows.
-
Simplifying Data Transformation in Redshift: An Approach with DBT and Airflow
Airflow is the most widely used and well-known tool for orchestrating data workflows. It allows for efficient pipeline construction, scheduling, and monitoring.
-
Share Your favorite python related software!
AIRFLOW This is more of a library in my opinion, but Airflow has become an essential tool for scheduling in my work. All our ML training pipelines are ordered and scheduled with Airflow and it works seamlessly. The dashboard provided is also fantastic!
-
Ask HN: What is the correct way to deal with pipelines?
I agree there are many options in this space. Two others to consider:
- https://airflow.apache.org/
- https://github.com/spotify/luigi
There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…
- "Você veio protestar para ter acesso ao código fonte da urnas. O que é o código fonte?" "Não sei" 🤡
- Cómo construir tu propia data platform. From zero to hero.
-
Is it impossible to contribute to open source as a data engineer?
You can try and contribute some new connectors/operators for workflow managers like Airflow or Airbyte
What are some alternatives?
incubator-dolphinscheduler - Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Kedro - Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
specification - Serverless Workflow Specification
dagster - An orchestration platform for the development, production, and observation of data assets.
professional-services - Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.
n8n - Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.
Windows-Containers - Welcome to our Windows Containers GitHub community! Ask questions, report bugs, and suggest features -- let's work together.
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
Dask - Parallel computing with task scheduling
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Apache Camel - Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.