dagster
buildkit
dagster | buildkit | |
---|---|---|
46 | 54 | |
10,274 | 7,697 | |
2.7% | 1.2% | |
10.0 | 9.8 | |
7 days ago | 1 day ago | |
Python | Go | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dagster
- Experience with Dagster.io?
-
Dagster tutorials
My recommendation is to continue on with the tutorial, then look at one of the larger example projects especially the ones named “project_”, and you should understand most of it. Of what you don't understand and you're curious about, look into the relevant concept page for the functions in the docs.
-
The Dagster Master Plan
I found this example that helped me - https://github.com/dagster-io/dagster/tree/master/examples/project_fully_featured/project_fully_featured
-
What are some open-source ML pipeline managers that are easy to use?
I would recommend the following: - https://www.mage.ai/ - https://dagster.io/ - https://www.prefect.io/ - https://metaflow.org/ - https://zenml.io/home
-
The Why and How of Dagster User Code Deployment Automation
In Helm terms: there are 2 charts, namely the system: dagster/dagster (values.yaml), and the user code: dagster/dagster-user-deployments (values.yaml). Note that you have to set dagster-user-deployments.enabled: true in the dagster/dagster values-yaml to enable this.
-
Best Orchestration Tool to run dbt projects?
Dagster seemed really cool when I looked into it as an alternative to airflow. I especially like the software defined assets and built-in lineage which I haven't seen in any other tool. However it seems it does not support RBAC which is a pretty big issue if you want a self-service type of architecture, see https://github.com/dagster-io/dagster/issues/2219. It does seem like it's available in their hosted version, but I wanted to run it myself on k8s.
-
dbt Cloud Alternatives?
Dagster? https://dagster.io
-
What's the best thing/library you learned this year ?
One that I haven't seen on here yet: dagster
- Anyone have an example of a project where a handful of the more popular Python tools are used? (E.g. airbyte, airflow, dbt, and pandas)
- Can we take a moment to appreciate how much of dataengineering is open source?
buildkit
-
Caching PNPM Modules in Docker Builds in GitHub Actions
The currently proposed solution is to allow Docker to bind the cache directory in the build to a directory on the host. This way the cache could be persisted externally. However, this issue has been opened for almost 4 years (May 27, 2020) with no clear answer as to whether it'll be implemented any time soon.
- ARM vs x86 em Docker
-
The worst thing about Jenkins is that it works
> We are uding docker-in-docker at the moment
You can also run a "less privileged" container with all the features of Docker by using rootless buildkit in Kubernetes. Here are some examples:
https://github.com/moby/buildkit/tree/master/examples/kubern...
https://github.com/moby/buildkit/blob/master/examples/kubern...
It's also possible to run dedicated buildkitd workers and connect to them remotely.
-
Show HN: Dockerfile Explorer
- BuildOp evaluates its input as additional LLB operations to add to the graph to allow for dynamic build graphs (also unused in the Dockerfile frontend)
With the Dockerfile Explorer, we run the Dockerfile frontend[1] that BuildKit uses inside of WASM to parse and produce the LLB output locally in your browser. We then embed the Monaco Editor so that you can change your Dockerfile to see how it impacts the LLB output that BuildKit will use to build your Docker image.
You can see a quick video and read more details on how it all works here: https://depot.dev/blog/dockerfile-explorer.
We'd love any feedback or ideas folks would like around this type of tool!
[0] https://github.com/moby/buildkit#exploring-llb
- macOS Containers v0.0.1
-
Jenkins Agents On Kubernetes
Now since Kubernetes works off of containerd I'll be taking a different approach on handling container builds by using nerdctl and the buildkit that comes bundled with it. I'll do this on the amd64 control plane node since it's beefier than my Raspberry Pi workers for handling builds and build related services. Go ahead and download and unpack the latest nerdctl release as of writing (make sure to check the release page in case there's a new one):
-
Frequent Docker BuildKit cache misses with w/ multi-stage and docker-container
There's a 2-year-old moby/buildkit GitHub issue about frequent build cache misses when using the BuildKit docker-container driver and multi-stage builds. Anyone else in this sub run into this problem and/or have reasonable workarounds? It seems like something that should come up pretty often.
-
A Panic in BuildKit: an Open Source Journey
A couple months ago I encountered a bug in buildkit - when enabling OpenTelemetry tracing, we got occasional panics. With a bit of investigation, we found the cause, fixed and tested in our fork and internal deployments, and pushed to upstream.
-
Is it possible to copy files from a manifest in Dockerfile?
I do some search in the internet and there seems to be no good solution, so I just create a feature request: https://github.com/moby/buildkit/issues/3859
-
Cicada - CI/CD platform written with Rust
Yeah, only Linux containers at the moment, BuildKit is the way we are constructing pipelines and doing caching. Split on if we will support non-linux hosts, but definitely want to find a good solution to not doing Docker-in-Docker.
What are some alternatives?
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
buildah - A tool that facilitates building OCI images.
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
kaniko - Build Container Images In Kubernetes
Mage - 🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
jib - 🏗 Build container images for your Java applications.
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
buildx - Docker CLI plugin for extended build capabilities with BuildKit
MLflow - Open source platform for the machine learning lifecycle
podman - Podman: A tool for managing OCI containers and pods.
meltano
nerdctl - contaiNERD CTL - Docker-compatible CLI for containerd, with support for Compose, Rootless, eStargz, OCIcrypt, IPFS, ...