kestra
typhoon-orchestrator
Our great sponsors
kestra | typhoon-orchestrator | |
---|---|---|
32 | 14 | |
6,340 | 29 | |
14.7% | - | |
9.9 | 0.0 | |
5 days ago | over 1 year ago | |
Java | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
kestra
-
A High-Performance, Java-Based Orchestration Platform
Kestra's communication is asynchronous and based on a queuing mechanism. It leverages the Micronaut framework and offers two runners: one that uses a database (JDBC) for both the message queue and resource storage, and another that uses Kafka as the message queue and Elasticsearch as the resource storage. The platform is fully extensible and plugin-based, providing a rich set of plugins for various workflow tasks, triggers, and data storage options. For those interested, the GitHub repository is available here: https://github.com/kestra-io/kestra
- Kestra is an open-source data orchestration platform for complex workflows
- YAML-based data orchestrator
- Kestra
-
Introduction to Kestra, the open source data orchestration and scheduling platform
For everyone wondering https://github.com/kestra-io/kestra/discussions/468
-
Snowflake data pipeline with Kestra
If you need any guidance with your Snowflake deployment, our experts at Kestra would love to hear from you. Let us know if you would like us to add more plugins to the list. Or start building your custom Kestra plugin today and send it our way. We always welcome contributions!
-
Airflow's Problem
But I totally agree that a large static dag is not appropriate in the actual data world with data mesh and domain responsibility.
[0] https://github.com/kestra-io/kestra
-
Ask HN: Open-source with Kafka as dependencies, is this a instant turn off?
- We have plans to add another option that will replace both dependencies with jdbc (https://github.com/kestra-io/kestra/pull/368), is theses dependencies more comfortable for you?
-
ELT vs ETL: Why not both?
With Kestra's innate flexibility, and many integrations, you are not locked into the choice of one ingestion method or the other. Complex workflows can be developed, whether in parallel or sequentially, to deliver both ELT and ETL processes. Simple descriptive yaml is used to connect plugins, and to create flows. Because workflows created in Kestra are represented visually, and issues can be seen in relation to individual tasks, there is no need to fear complexity. Trouble can be traced to its source in an instant, allowing you to try new things and come up with a new solution without fear. Give it a try, and let us know what you come up with!
-
Debezium Change Data Capture without Kafka Connect
Kestra is an orchestration and scheduling platform that is designed to simplify the building, running, scheduling, and monitoring of complex data pipelines. Data pipelines can be built in real-time, no matter how complex the workflow, and can connect to multiple resources as needed (including Debezium).
typhoon-orchestrator
- After Airflow. Where next for DE?
- New OSS Orchestrator - Where should we go next?
-
Airflow's Problem
I have my own opinion on Airflow's pain points and created Typhoon Orchestrator (https://github.com/typhoon-data-org/typhoon-orchestrator) to solve them. It doesn't have many stars yet but I've used it to create some pipelines for medium sized companies in a few days, and they've been running for over a year without issues.
In particular I transpile to Airflow code (can also deploy to Lambda) because I think it's still the most robust and well supported "runtime", I just don't think the developer experience is that good.
-
Data Engineering for very small businesses. Any experiences?
Typhoon Orchestrator This is a framework that I designed to help fix some of the pain points of Airflow so that I could build test and deploy pipelines faster. You could skip this step but if you want more info check here.
-
CSV data library to database
I am also collaborating on an open source tool called Typhoon Orchestrator (repo). It aims to make composing airflow data pipelines simple and quite quick. Putting pipeline steps together like lego.
-
Recommendations for simple ETL (Postgres to Snowflake)
The project (https://github.com/typhoon-data-org/typhoon-orchestrator) doesn't have many stars yet but I have deployed it on a medium sized hotel chain for several data sources with a similar use case to yours and it's been working for over a year with no intervention. If you decide to pursue this option I'd be willing to provide provide some support free of charge (feel free to PM me).
-
Impress your friends! Make a serverless bot that sends daily jokes to a Telegram Group
Typhoon Orchestrator is a great way to deploy ETL workflow on AWS Lambda. In this tutorial we intend to show how easy to use and versatile it is by deploying code to Lambda that gets a random joke from https://jokeapi.dev once a day and sends it to your telegram group.
-
My Thirty Years of Dodging Repetitive Work with Automation Tools
I think there's space for an open source library that can help with what you described. We originally created https://github.com/typhoon-data-org/typhoon-orchestrator to orchestrate ETL workflows, which would be a superset of the use cases you described. Our next goal is to allow deployment to AWS lambda which can be a good compromise between getting locked in with SAAS and hosting your own infrastructure.
Also check out Zappa's scheduled tasks that have a similar goal and inspired our library.
- Airflow, you complete me! Compose YAML DAGs for Airflow with auto-complete with Typhoon (Open Source).
- Use Airflow? Composable elegant YAML DAGS that transpile to Airflow. Zero risk and no migration.
What are some alternatives?
conductor - Conductor is a microservices orchestration engine.
JokeAPI - REST API that serves uniformly and well formatted jokes in JSON, XML, YAML or plain text format that also offers a great variety of filtering methods
zeebe - Distributed Workflow Engine for Microservices Orchestration
Mage - 🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
kogito-runtimes - This repository is a fork of apache/incubator-kie-kogito-runtimes. Please use upstream repository for development.
astro - Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow. [Moved to: https://github.com/astronomer/astro-sdk]
debezium - Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
astro-sdk - Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
akhq - Kafka GUI for Apache Kafka to manage topics, topics data, consumers group, schema registry, connect and more...
pachyderm - Data-Centric Pipelines and Data Versioning
flyte - Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
getting-started - This repository is a getting started guide to Singer.