-
Re. handling failure, we leave that up to an application/framework layer decision. When the backend is used for program state, the common approach is an auto-save loop that persists state externally (asynchronously) on a loop. If the backend is only used in a read-only way, the approach is to just recreate it on failure with the same parameters.
In general, Plane backends are meant to be used with thick clients, so there’s also the option to treat clients as nodes in a distributed system for the purpose of failure. If the server goes down and is replaced, when it comes back up, the nodes could buffer and replay any messages that may have been lost during the failure. Over time as we see patterns emerge, we may create frameworks from them (like aper.dev) to abstract the burden away from the application layer.
Time series metrics are exposed through Docker’s API, collectors for it already exist for various sinks. We will soon be sending some time series metrics over NATS to use internally for scheduling, but the Docker API will be better for external consumption because the collector ecosystem is already robust.
Resource caps can be defined at “spawn time”. They are not expected to have similar consumption, but the scheduler is not yet very smart, our approach currently is admittedly to overprovision. The scheduler is a big Q4 priority for us.
Draining currently involves terminating the “agent” process on the node, which stops the drone from advertising itself to the controller. Traffic still gets routed to backends running on that drone. We have an open issue[1] to implement a message to do this automatically.
[1] https://github.com/drifting-in-space/plane/issues/129
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
Yes, for CPU bound processed on the BEAM you'll want to use a NIF (native implemented function) but that leaves you open to taking down the entire VM with bad NIF code (segfaults, infinite loops, etc). A purported safer means to create NIFs is to use Rustler (https://github.com/rusterlium/rustler) which lets you easily write NIFs in Rust instead of C. I haven't used it but I've heard good things.