dagster
airbyte
Our great sponsors
dagster | airbyte | |
---|---|---|
46 | 139 | |
9,939 | 13,646 | |
4.7% | 5.2% | |
10.0 | 10.0 | |
6 days ago | 5 days ago | |
Python | Python | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dagster
-
The Dagster Master Plan
I found this example that helped me - https://github.com/dagster-io/dagster/tree/master/examples/project_fully_featured/project_fully_featured
In the meantime, we're collecting solutions and use cases in our GitHub Discussions, and you're welcome to ask any specific questions in there!
-
What are some open-source ML pipeline managers that are easy to use?
I would recommend the following: - https://www.mage.ai/ - https://dagster.io/ - https://www.prefect.io/ - https://metaflow.org/ - https://zenml.io/home
-
Best Orchestration Tool to run dbt projects?
Dagster seemed really cool when I looked into it as an alternative to airflow. I especially like the software defined assets and built-in lineage which I haven't seen in any other tool. However it seems it does not support RBAC which is a pretty big issue if you want a self-service type of architecture, see https://github.com/dagster-io/dagster/issues/2219. It does seem like it's available in their hosted version, but I wanted to run it myself on k8s.
-
dbt Cloud Alternatives?
Dagster? https://dagster.io
-
What's the best thing/library you learned this year ?
One that I haven't seen on here yet: dagster
- Can we take a moment to appreciate how much of dataengineering is open source?
-
Dagger Python SDK: Develop Your CI/CD Pipelines as Code
I wondered how it related to https://dagster.io/
-
Data Engineer Github Profile?
You can find all current, closed, and resolved issues on the “Issues” section and explore them using filters: eg issues for dagster. Look into some of the issues and feel free to ask a question or post your idea: it’s much less toxic here (compared to SO, for example).
-
[D] Should I go with Prefect, Argo or Flyte for Model Training and ML workflow orchestration?
You could also consider Dagster, which aims to improve Apache Airflow's shortcomings. Also, take a look at MyMLOps, where you can get a quick overview of open-source orchestration tools.
airbyte
-
Who's hiring developer advocates? (October 2023)
Link to GitHub -->
- All the ways to capture changes in Postgres
-
Is it impossible to contribute to open source as a data engineer?
You can try and contribute some new connectors/operators for workflow managers like Airflow or Airbyte
-
airbyte VS cloudquery - a user suggested alternative
2 projects | 2 Jun 20232 projects | 2 Jun 2023
-
New age ETL products every data team needs to know
- https://airbyte.com/
2. Reverse ETL:
-
Is it safe to update docker/docker-compose?
Here's the docker-compose file https://github.com/airbytehq/airbyte/blob/master/docker-compose.yaml
I'm trying to insall https://airbyte.com/ is a great selfhosted ELT platform. In common words, it's an app that can access all kinds of api to scrub the data and put it in a database. I really like the idea of being able to own my data and make all kinds of analyse with it.
-
Top 10 Best Open Source GitHub repos for Developers 2023
AirByte GitHub: https://github.com/airbytehq/airbyte
What are some alternatives?
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Mage - 🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
MLflow - Open source platform for the machine learning lifecycle
meltano
jitsu - Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
spark-rapids - Spark RAPIDS plugin - accelerate Apache Spark with GPUs
dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
OpenLineage - An Open Standard for lineage metadata collection