bb-remote-execution
BeanstalkD
bb-remote-execution | BeanstalkD | |
---|---|---|
3 | 14 | |
104 | 6,477 | |
2.9% | 0.3% | |
8.2 | 0.0 | |
about 1 month ago | 3 days ago | |
Go | C | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
bb-remote-execution
-
Write Your Own Task Queue
Though it obviously depends on the case at hand, I sort of agree with this.
For a distributed build cluster that I maintain (Buildbarn, https://github.com/buildbarn/bb-remote-execution/), I also had to implement a scheduler process that would queue compilation/test actions, so that they can be executed on workers later on.
Initially I looked into using some conventional queueing system, but eventually settled on implementing my own as part of the scheduler process. So far I'm really happy with this choice, as it has allowed me to implement the following features, and more:
- In-flight deduplication of identical compilation actions. If identical actions are scheduled with different priorities, the highest priority is used.
- Multi-level scheduling fairness between groups, users in a group, builds run by the same user, etc.. The fairness cooperates well with priorities.
- Automatic removal of queued actions that are no longer associated with any running build.
- Stickiness, where workers prefer picking up actions that are similar to the one they ran previously, for reducing network utilisation.
- Facilities for draining workers.
Though I'm not saying it would have been impossible to achieve this with an off the shelf task queue, I'm not convinced it would have been easy. Adding new features right now only means I need to care about the actual semantics of it, as opposed to trying to figure out how to map it onto the feature set of the queueing system of choice.
-
LiteFS a FUSE-based file system for replicating SQLite
I was going to raise that point exactly.
As someone who spends an awful amount of time using FUSE, my recommendation is to only use it in cases where the software that interacts with the file system isn't easily changeable. For example, for Buildbarn which I maintain (https://github.com/buildbarn/bb-remote-execution), I need to use it. It's infeasible to change arbitrary compilers and tests to all interact with a network distributed build cache. Designing the FUSE file system was a pretty heavy investment though, as you really need to be POSIXly correct to make it all work. The quality of implementations of FUSE also varies between OSes and their versions. macFUSE, for example, is quite different from Linux FUSE.
Given that SQLite already has all of the hooks in place, I would strongly recommend using those. In addition to increasing portability, it also makes it easier to set up/run. As an example, it's pretty hard to mount a FUSE file system inside of a container running on Kubernetes without risking locking up the underlying host. Doing the same thing with the SQLite VFS hooks is likely easy and also doesn't require your container to run with superuser privileges.
-
Disorderfs: FUSE-based filesystem that introduces non-determinism into metadata
Buildbarn, a build cluster implementation for Bazel that I maintain, can also run build actions (compilation steps, unit tests) in a FUSE file system. Though the primary motivator for this is that it reduces the time to construct a build action's file system to nearly instant, it has the advantage that I can also do things similar to disorderfs. Shuffling directory listings is actually something that I also added. Pretty useful!
https://github.com/buildbarn/bb-remote-execution/blob/eb1150...
BeanstalkD
-
Ruby 3.3
There's beanstalkd, it has a few Python libraries and it works out of the box with ActiveJob via Backburner.
https://beanstalkd.github.io/
-
A Developer's Journal: Simplifying the Twelve-Factor App
Messaging/Queueing Systems (Amazon SQS, RabbitMQ, Beanstalkd)
- Load Balancing
-
SQL Maxis: Why We Ditched RabbitMQ and Replaced It with a Postgres Queue
Not when a queue is involved. IME trying to replicate something like beanstalkd (https://beanstalkd.github.io/) in postgres is asking for trouble for anything but trivial workloads.
If you're measuring throughput in jobs/s, use a real work queue.
-
Christmas giveaway: 10 copies of my book Domain-driven Design with Golang book, also AMA
Before Kafka was a standard, I created a go library for beanstalkd that act like an RPC.
-
PHP parallel processing idea
Then there are queue libraries like beanstalkd, RabbitMQ or built-in features like queues from Laravel. These will probably get you quicker to your goal then trying the process managing route.
- How to do distributed cronjobs with worker queues?
-
Write Your Own Task Queue
The only task queue I loved was beanstalkd -- it's beautifully written and highly performant. Starting it takes seconds and it's been running for a decade:
https://beanstalkd.github.io/
- Golang task queue
-
What are some popular background job processing frameworks in the Rust ecosystem?
It's not rust (it's C), but beanstalkd is a pretty incredible work queue that processes millions of jobs a day (10K+/s at peak) for my company. I know there are a few rust drivers available.
What are some alternatives?
litefs - FUSE-based file system for replicating SQLite databases across a cluster of machines
RabbitMQ - Open source RabbitMQ: core server and tier 1 (built-in) plugins
verneuil - Verneuil is a VFS extension for SQLite that asynchronously replicates databases to S3-compatible blob stores.
Apache Kafka - Mirror of Apache Kafka
asciiflow - ASCIIFlow
Gearman
miniqueue - A simple, single binary, message queue. Supports HTTP/2 and Redis Protocol.
NATS - High-Performance server for NATS.io, the cloud and edge native messaging system.
workerpool-go - auto-scaling worker pool (work queue) in Go, using generics
celery - Distributed Task Queue (development branch)
nsq - A realtime distributed messaging platform
Redis - Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.