Airflow
grouparoo
Airflow | grouparoo | |
---|---|---|
169 | 27 | |
34,570 | 607 | |
1.4% | - | |
10.0 | 9.9 | |
6 days ago | about 2 years ago | |
Python | JavaScript | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Airflow
-
Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions
Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.
-
Navigating Week Two: Insights and Experiences from My Tublian Internship Journey
In week Two, I contributed to the Apache Airflow repository.
-
Airflow VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
-
Best ETL Tools And Why To Choose
Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. The platform features a web-based user interface and a command-line interface for managing and triggering workflows.
-
Simplifying Data Transformation in Redshift: An Approach with DBT and Airflow
Airflow is the most widely used and well-known tool for orchestrating data workflows. It allows for efficient pipeline construction, scheduling, and monitoring.
-
Share Your favorite python related software!
AIRFLOW This is more of a library in my opinion, but Airflow has become an essential tool for scheduling in my work. All our ML training pipelines are ordered and scheduled with Airflow and it works seamlessly. The dashboard provided is also fantastic!
-
Ask HN: What is the correct way to deal with pipelines?
I agree there are many options in this space. Two others to consider:
- https://airflow.apache.org/
- https://github.com/spotify/luigi
There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…
- "Você veio protestar para ter acesso ao código fonte da urnas. O que é o código fonte?" "Não sei" 🤡
- Cómo construir tu propia data platform. From zero to hero.
-
Is it impossible to contribute to open source as a data engineer?
You can try and contribute some new connectors/operators for workflow managers like Airflow or Airbyte
grouparoo
-
Reverse ETL recommendations?
Reverse ETL is on AirByte's roadmap under the "Future / Not prioritized" section. I wanted to use Grouparoo as a short term solution, but the repo was archived and I think they stopped taking new cloud customers (unknown if this is wrong/outdated).
-
Reference Data Stack for Data-Driven Startups
There are other tools that we will have to adopt in the future but haven’t yet due to lack of necessity. Specifically, one category that is popular in modern data stacks is Reverse ETL (Hightouch, Census, or Grouparoo). We currently don’t have a usecase for piping data back into 3rd party tools but it will definitely come up in the future.
-
Data pipeline suggestions
Reverse ETL: Grouparoo, Castled
-
Is Reverse ETL a new product or a new ETL/ELT feature?
Grouparoo, the open source Reverse ETL tool we are building, does all of these things. https://www.grouparoo.com
-
Where can I find free data engineering ( big data) projects online?
Ingestion / ETL: Airbyte, Singer, Jitsu Transformation: dbt Orchestration: Airflow, Dagster Testing: GreatExpectations Observability: Monosi Reverse ETL: Grouparoo, Castled Visualization: Lightdash, Superset
- Invite your company
-
Ask HN: Who is hiring? (December 2021)
Grouparoo | Remote (US) | Remote-OK | https://www.grouparoo.com
Grouparoo is a venture-backed software company building open source data tools that make data reliable, accessible, and actionable. We’re empowering teams to make great customer experiences, driven by data. While engineering teams have gotten good at storing and generating data about their customers, it’s rare that this data is used to its full potential in external applications. Grouparoo makes these integrations easy by providing a framework for defining your customer data and reliably syncing it to external tools.
To learn more about who we are, our engineering culture, and whether this is the right place for you, read our Key Values profile: https://www.keyvalues.com/grouparoo
Here are our open roles:
- Senior Backend / Lead Engineer: https://jobs.lever.co/grouparoo/6ba485d1-a5a4-41f0-9fa5-920a...
- Developer Advocate: https://jobs.lever.co/grouparoo/5e1531b4-7ec8-4c10-8e52-fc23...
Tech Stack: TypeScript / Javascript / Node.js, ActionHero, React + Next.js, Postgres & Redis, and whole lot of third-party APIs!
-
Launch HN: Hightouch (YC S19) – Sync data from data warehouses to SaaS tools
Congrats on the launch! Hightouch looks great and this need is real. Things seem to be going well, so I don't think I'm taking too much away by mentioning that we have been been working on Grouparoo, an open source alternative that solves similar pain points.
A few differences: git developer workflow focused (branches, CI, PRs, etc), ability to self host, segmentation in destinations (tagging people in mailchimp based on rules, for example)
https://www.grouparoo.com
-
Reverse ETL
We are building Grouparoo. Obviously, it's a biased sample but we are seeing a few trends at play.
-
What software or coding tools are you trying to get your company to invest in?
Has anyone heard or used of reverse etl or Hightouch or open sourced Grouparoo?
What are some alternatives?
Kedro - Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
rotki - A portfolio tracking, analytics, accounting and management application that protects your privacy
dagster - An orchestration platform for the development, production, and observation of data assets.
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
n8n - Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.
TileDB - The Universal Storage Engine
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
meltano
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
streamlit - Streamlit — A faster way to build and share data apps.
Dask - Parallel computing with task scheduling
PostHog - 🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.