lakeFS
Concourse
Our great sponsors
lakeFS | Concourse | |
---|---|---|
48 | 47 | |
4,058 | 7,165 | |
2.3% | 0.7% | |
9.8 | 9.0 | |
5 days ago | 7 days ago | |
Go | Go | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lakeFS
-
A Step-by-Step Guide to Implementing Data Version Control
# Download the LakeFS binary wget https://github.com/treeverse/lakeFS/releases/latest/download/lakefs # Make the binary executable chmod +x lakefs # Initialize LakeFS with S3 as the storage backend ./lakefs init --backend s3 --s3-gateway-endpoint --s3-region --s3-force-path-style --s3-access-key --s3-secret-key
-
Jujutsu: A Git-compatible DVCS that is both simple and powerful
Might want to look at purpose built tools for that such as lakeFS (https://github.com/treeverse/lakeFS/)
* Disclaimer: I'm one of the creators/maintainers of the project.
-
Data diffs: Algorithms for explaining what changed in a dataset (2022)
Might want to checkout lakeFS: https://github.com/treeverse/lakeFS
(full disclosure: I'm one of the creators)
-
Transactions in Spark / Delta lake?
Take a look at https://github.com/treeverse/lakeFS -
- LakeFS – Version Control for Big Data
- DuckDB <3 LakeFS
- We built an open-source project (3.1K stars on GitHub) for data version control
-
How are you incrementally testing your data pipelines as you develop them?
I mean if you're ready to adopt a new framework into your ecosystem this is one of the major usecases for LakeFS.
- Git-for-Data
- LakeFS: Git-like versioning for object stores
Concourse
-
Elm 2023, a year in review
Ableton ⬩ Acima ⬩ ACKO ⬩ ActiveState ⬩ Adrima ⬩ AJR International ⬩ Alma ⬩ Astrosat ⬩ Ava ⬩ Avetta ⬩ Azara ⬩ Barmenia ⬩ Basiq ⬩ Beautiful Destinations ⬩ BEC Systems ⬩ Bekk ⬩ Bellroy ⬩ Bendyworks ⬩ Bernoulli Finance ⬩ Blue Fog Training ⬩ BravoTran ⬩ Brilliant ⬩ Budapest School ⬩ Buildr ⬩ Cachix ⬩ CalculoJuridico ⬩ CareRev ⬩ CARFAX ⬩ Caribou ⬩ carwow ⬩ CBANC ⬩ CircuitHub ⬩ CN Group CZ ⬩ CoinTracking ⬩ Concourse CI ⬩ Consensys ⬩ Cornell Tech ⬩ Corvus ⬩ Crowdstrike ⬩ Culture Amp ⬩ Day One ⬩ Deepgram ⬩ diesdas.digital ⬩ Dividat ⬩ Driebit ⬩ Drip ⬩ Emirates ⬩ eSpark ⬩ EXR ⬩ Featurespace ⬩ Field 33 ⬩ Fission ⬩ Flint ⬩ Folq ⬩ Ford ⬩ Forsikring ⬩ Foxhound Systems ⬩ Futurice ⬩ FörsäkringsGirot ⬩ Generative ⬩ Genesys ⬩ Geora ⬩ Gizra ⬩ GWI ⬩ HAMBS ⬩ Hatch ⬩ Hearken ⬩ hello RSE ⬩ HubTran ⬩ IBM ⬩ Idein ⬩ Illuminate ⬩ Improbable ⬩ Innovation through understanding ⬩ Insurello ⬩ iwantmyname ⬩ jambit ⬩ Jobvite ⬩ KOVnet ⬩ Kulkul ⬩ Logistically ⬩ Luko ⬩ Metronome Growth Systems ⬩ Microsoft ⬩ MidwayUSA ⬩ Mimo ⬩ Mind Gym ⬩ MindGym ⬩ Next DLP ⬩ NLX ⬩ Nomalab ⬩ Nomi ⬩ NoRedInk ⬩ Novabench ⬩ NZ Herald ⬩ Permutive ⬩ Phrase ⬩ PINATA ⬩ PinMeTo ⬩ Pivotal Tracker ⬩ PowerReviews ⬩ Practle ⬩ Prima ⬩ Rakuten ⬩ Roompact ⬩ SAVR ⬩ Scoville ⬩ Scrive ⬩ Scrivito ⬩ Serenytics ⬩ Smallbrooks ⬩ Snapview ⬩ SoPost ⬩ Splink ⬩ Spottt ⬩ Stax ⬩ Stowga ⬩ StructionSite ⬩ Studyplus For School ⬩ Symbaloo ⬩ Talend ⬩ Tallink & Silja Line ⬩ Test Double ⬩ thoughtbot ⬩ Travel Perk ⬩ TruQu ⬩ TWave ⬩ Tyler ⬩ Uncover ⬩ Unison ⬩ Veeva ⬩ Vendr ⬩ Verity ⬩ Vnator ⬩ Vy ⬩ W&W Interaction Solutions ⬩ Watermark ⬩ Webbhuset ⬩ Wejoinin ⬩ Zalora ⬩ ZEIT.IO ⬩ Zettle
- The worst thing about Jenkins is that it works
- Show HN: Togomak – declarative pipeline orchestrator based on HCL and Terraform
-
GitHub Actions could be so much better
> Why bother, when Dagger caches everything automatically?
The fear with needing to run `npm ci` (or better, `pnpm install`) before running dagger is on the amount of time required to get this step to run. Sure, in the early days, trying out toy examples, when the only dependencies are from dagger upstream, very little time at all. But what happens when I start pulling more and more dependencies from the Node ecosystem to build the Dagger pipeline? Your documentation includes examples like pulling in `@google-cloud/run` as a dependency: https://docs.dagger.io/620941/github-google-cloud#step-3-cre... and similar for Azure: https://docs.dagger.io/620301/azure-pipelines-container-inst... . The more dependencies brought in - the longer `npm ci` is going to take on GitHub Actions. And it's pretty predictable that, in a complicated pipeline, the list of dependencies is going to get pretty big - at least a dependency per infrastructure provider we use, plus inevitably all the random Node dependencies that work their way into any Node project, like eslint, dotenv, prettier, testing dependencies... I think I have a reasonable fear that `npm ci` just for the Dagger pipeline will hit multiple minutes, and then developers who expect linting and similar short-run jobs to finish within 30 seconds are going to wonder why they're dealing with this overhead.
It's worth noting that one of Concourse's problems was, even with webhooks setup for GitHub to notify Concourse to begin a build, Concourse's design required it to dump the contents of the webhook and query the GitHub API for the same information (whether there were new commits) before starting a pipeline and cloning the repository (see: https://github.com/concourse/concourse/issues/2240 ). And that was for a CI/CD system where, for all YAML's faults, for sure one of its strengths is that it doesn't require running `npm ci`, with all its associated slowness. So please take it on faith that, if even a relatively small source of latency like that was felt in Concourse, for sure the latency from running `npm ci` will be felt, and Dagger's users (DevOps) will be put in an uncomfortable place where they need to defend the choice of Dagger from their users (developers) who go home and build a toy example on AlternateCI which runs what they need much faster.
> I will concede that Dagger’s clustering capabilities are not great yet
Herein my argument. It's not that I'm not convinced that building pipelines in a general-purpose programming language is a better approach compared to YAML, it's that building pipelines is tightly coupled with the infrastructure that runs the pipelines. One aspect of that is scaling up compute to meet the requirements dictated by the pipeline. But another aspect is that `npm ci` should not be run before submitting the pipeline code to Dagger, but after submitting the pipeline code to Dagger. Dagger should be responsible for running `npm ci`, just like Concourse was responsible for doing all the interpolation of the `((var))` syntax (i.e. you didn't need to run some kind of templating before submitting the YAML to Concourse). If Dagger is responsible for running `npm ci` (really, `pnpm install`), then it can maintain its own local pnpm store / pipeline dependency caching, which would be much faster, and overcome any shortcomings in the caching system of GitHub Actions or whatever else is triggering it.
-
We built the fastest CI in the world. It failed
> Imagine you live in a world where no part of the build has to repeat unless the changes actually impacted it. A world in which all builds happened with automatic parallelism. A world in which you could reproduce very reliably any part of the build on your laptop.
That sounds similar to https://concourse-ci.org/
I quite like it, but it never seemed to gain traction outside of Cloud Foundry.
-
Ask HN: What do you use to run background jobs?
I used Concourse[0] for a while. No real complaints, the visibility is nice but the functionality isn't anything new.
[0] https://concourse-ci.org/
-
How to host React/Next "Cheaply" with a global audience? (NGO needs help)
We run https://concourse-ci.org/ on our own hardware at our office. (as a side note, running your own hardware, you realise just how abysmally slow most cloud servers are.)
-
What are some good self-hosted CI/CD tools where pipeline steps run in docker containers?
Concourse: https://concourse-ci.org
- JSON vs XML
-
Cicada - Build CI pipelines using TypeScript
We use https://concourse-ci.org/ at the moment and have been reasonably happy with it, however it only has support for linux containers at the moment, no windows containers. (MacOS doesn't have a containers primitive yet unfortunately)
What are some alternatives?
dvc - 🦉 ML Experiments and Data Management with Git
drone - Gitness is an Open Source developer platform with Source Control management, Continuous Integration and Continuous Delivery. [Moved to: https://github.com/harness/gitness]
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
GitlabCi
git-lfs - Git extension for versioning large files
woodpecker - Woodpecker is a simple yet powerful CI/CD engine with great extensibility.
Ory Kratos - Next-gen identity server replacing your Auth0, Okta, Firebase with hardened security and PassKeys, SMS, OIDC, Social Sign In, MFA, FIDO, TOTP and OTP, WebAuthn, passwordless and much more. Golang, headless, API-first. Available as a worry-free SaaS with the fairest pricing on the market!
Jenkins - A static site for the Jenkins automation server
MLflow - Open source platform for the machine learning lifecycle
Jenkins - Jenkins automation server
duf - Disk Usage/Free Utility - a better 'df' alternative
Buildbot - Python-based continuous integration testing framework; your pull requests are more than welcome!