Airflow
terraform
Airflow | terraform | |
---|---|---|
171 | 512 | |
35,036 | 41,496 | |
1.6% | 0.8% | |
10.0 | 9.9 | |
3 days ago | 2 days ago | |
Python | Go | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Airflow
-
10 Open Source Tools for Building MLOps Pipelines
An integral part of an ML project is data acquisition and data transformation into the required format. This involves creating ETL (extract, transform, load) pipelines and running them periodically. Airflow is an open source platform that helps engineers create and manage complex data pipelines. Furthermore, the support for Python programming language makes it easy for ML teams to adopt Airflow.
-
AI Strategy Guide: How to Scale AI Across Your Business
Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.
-
Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions
Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.
-
Navigating Week Two: Insights and Experiences from My Tublian Internship Journey
In week Two, I contributed to the Apache Airflow repository.
-
Airflow VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
-
Best ETL Tools And Why To Choose
Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. The platform features a web-based user interface and a command-line interface for managing and triggering workflows.
-
Simplifying Data Transformation in Redshift: An Approach with DBT and Airflow
Airflow is the most widely used and well-known tool for orchestrating data workflows. It allows for efficient pipeline construction, scheduling, and monitoring.
-
Share Your favorite python related software!
AIRFLOW This is more of a library in my opinion, but Airflow has become an essential tool for scheduling in my work. All our ML training pipelines are ordered and scheduled with Airflow and it works seamlessly. The dashboard provided is also fantastic!
-
Ask HN: What is the correct way to deal with pipelines?
I agree there are many options in this space. Two others to consider:
- https://airflow.apache.org/
- https://github.com/spotify/luigi
There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…
- "Você veio protestar para ter acesso ao código fonte da urnas. O que é o código fonte?" "Não sei" 🤡
terraform
-
26 Top Kubernetes Tools
Terraform is a leading Infrastructure as Code (IaC) tool that allows you to automate cloud provisioning and management activities.
-
Terraform - Let's keep the quality up!
The terraform test command and the options of mocking resources and data sources enable a lot more than we have tried out here in this blog post. I highly recommend to take a closer look at the documentation and the blog post referenced before and play around with them. Be aware that this is a quite "young" functionality, so maybe you stumble over issues or might miss some features. If this is the case you definitely should open an issue in the corresponding repository.
-
Getting my feet wet with Kubernetes
I decided to use Terraform to manage my K8 resources. I know that there are probably better ways of doing this (like Argo CD or Flux CD), but I ended up settling with Terraform as I was already familiar with the tool and it allowed me to achieve the goal of trying out K8s without being bogged down too much on the deployment process.
-
HashiCorp Vault Quickstart
It uses HashiCorp Terraform to provision the PKI and secrets so that they can be quickly and easily rotated.
- Golang REST API boilerplate
-
Cloud Resume Challenge Chunk 2
I used the aws console at first to get reacquainted with dynamodb, lambda and apigateway. After getting everything to work, I used Terraform to deploy all of the infrastructure pieces. The Github repo can be found here.
-
Cloud Resume Challenge Chunk 1
Rather than point and click in the AWS console, I decided to start with IaC using Terraform. I also decided to use GitHub actions(https://docs.github.com/en/actions) for CI/CD to get familiar with them. I had only used GitLab CI/CD and runners previously, which are very similar to GitHub Actions.
-
EC2 real network bandwidth
To implement this, we need to create a pair of EC2 instances along with their corresponding resources, such as roles and security groups, in our AWS account. Doing this manually for every EC2 instance type we need to measure could be tedious, so we'll use Terraform for this task.
-
Clusters Are Cattle Until You Deploy Ingress
Dan: The entire deployment workflow for Kubernetes revolves around Argo CD. When I set up a cluster, some might default to using kubectl apply, or if they're using Terraform, they might opt for the Helm provider to install various Helm charts. However, with Argo CD, I have precise control over deployment processes.
-
How to deploy your own website on AWS
Terraform/OpenTofu installed. We use Terraform in this article.
What are some alternatives?
Kedro - Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
terragrunt - Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.
dagster - An orchestration platform for the development, production, and observation of data assets.
Docker Compose - Define and run multi-container applications with Docker
n8n - Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.
terraform-provider-restapi - A terraform provider to manage objects in a RESTful API
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
crossplane - The Cloud Native Control Plane
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
boto3 - AWS SDK for Python
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
nvim-lspconfig - Quickstart configs for Nvim LSP