Should You Be Scared of Unix Signals?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • bc

    An implementation of the POSIX bc calculator with GNU extensions and dc, moved away from GitHub. Finished, but well-maintained.

  • While Julia starts scared and ends up feeling better, I've only gotten more scared of Unix signals over time.

    Context: I've written a robust command-line utility [0] that must handle signals, and unlike most utilities that are I/O-bound, mine is CPU-bound.

    An I/O-bound utility can easily use signalfd() (subject to the gotchas in the "signalfd() is useless" post that Julia links to, which you should also read). signalfd() will work for I/O-bound utilities because it turns those signals into I/O. Perfect.

    However, in a CPU-bound program, signals are used specifically to interrupt execution. This is, to put it mildly, as difficult as writing code for interrupts in the embedded space. Why? Because that's really what you're doing: handling an interrupt that can happen at any time.

    My solution was something I wish on no one: I used setjmp() and longjmp().

    Horrors!

    Yep. And it gets worse: I had to longjmp() out of the signal handler.

    AH!

    And it gets worse: to ensure that there were no memory leaks, I had to keep a stack of jmp_bufs and manually jump to each one, which would be in a function where memory had to be cleaned up.

    Cue screams of bloody murder

    You may insist that longjmp()'ing out of signal handler is not allowed; it actually is [1], but unlike most other "async-signal-safe" functions, you need to ensure you don't interrupt a syscall or other code that is not async-signal-safe.

    So it gets worse: I have a signal lock that the signal handler checks. If it's not locked, the signal handler will longjmp() out of the signal handler. If it is locked, the signal handler sets a flag and returns. Then the code that unlocks signals checks for the flag and does a longjmp() if it's set.

    He's dead, Jim!

    I have another project that is a framework in C. This framework needs to handle signals for clients. It has to be general, so it has to handle CPU-bound clients. So I had to implement the same thing. I was able to make it easier, but it is also harder because I have that one thing that messes up every Unix API: threads.

    Nuclear mushroom cloud

    So should you be scared? It depends; if you can get away with signalfd() and know it's gotchas, maybe not.

    But if you need anything more complex, yes, be very afraid.

    [0]: https://git.gavinhoward.com/gavin/bc

    [1]: https://pubs.opengroup.org/onlinepubs/9699919799.2008edition...

  • tini

    A tiny but valid `init` for containers

  • Ah gotcha. I believe it can be baked into images as well, per the entrypoint example in the readme: https://github.com/krallin/tini

    Not sure how this will fare IRL in k8s as I haven’t much experience there. It’s still silly that this is the default behavior where you need something like Tini, but I digress.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ClickHouse

    ClickHouse® is a free analytics DBMS for big data

  • We are using a fuzzing method in ClickHouse that involves sending signals to random threads: https://github.com/ClickHouse/ClickHouse/blob/master/src/Com...

    We immediately found bugs in many dependent libraries of one of three types:

    1. A library does not check for EINTR.

    2. A library that waits for something does check for EINTR but does not update a timeout, therefore stopping sooner than needed.

    3. A library that waits for something does check for EINTR but incorrectly updates a timeout and waiting indefinitely.

  • Polyphony

    Fine-grained concurrency for Ruby

  • When using green threads/fibers/coroutines, an interesting technique to make signal handling safer is to run the signal handler asynchronously on a separate fiber/green thread. That way most of the problems of dealing with signals go away, and there's basically no limitation on what you can do inside the signal handler.

    I've successfully used this technique in Polyphony [1], a fiber-based Ruby gem for writing concurrent programs. When a signal occurs, Polyphony creates a special-purpose fiber that runs the signal handling code. The fiber is put at the head of the run queue, and is resumed once the currently executed fiber yields control.

    [1] https://github.com/digital-fabric/polyphony

  • fluent-bit

    Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows

  • > Libc is a lot more tricky about signals, since not all libc functions can be safely called from handlers.

    And this is a huge thing. People do all kinds of operations in signal handlers completely oblivious to the pitfalls. Pitfalls which often do not manifest, making it a great "it works for me" territory.

    I once raised a ticket on fluentbit[1] about it but they have abused signal handlers so thoroughly that I do not think they can mitigate the issue without a major rewriting of the signal and crash handling.

    [1] https://github.com/fluent/fluent-bit/issues/4836

  • wasmer

    🚀 The leading Wasm Runtime supporting WASIX, WASI and Emscripten

  • There don't seem to be many write-ups on this concept. The best reference seems to be existing implementations:

    Wasmer's implementation of metering[0] just traps when it runs out of fuel. WasmEdge's implementation of interruptibility[1] checks a flag and stops execution if it's set.

    While neither of these support resuming execution after the deadline, replacing the halt with a call to a signal dispatcher should work.

    Wasmtime has two different implementations of interrupting execution that both support resuming[2]. The fuel mechanism[3] is deterministic but the epoch mechanism[4] is more performant. If you're free to pick your runtime, I'm sure you could configure Wasmtime into doing what you want.

    [0]: https://github.com/wasmerio/wasmer/blob/master/lib/middlewar...

  • SSVM

    WasmEdge is a lightweight, high-performance, and extensible WebAssembly runtime for cloud native, edge, and decentralized applications. It powers serverless apps, embedded functions, microservices, smart contracts, and IoT devices.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • wasmtime

    A fast and secure runtime for WebAssembly

  • [3]: https://github.com/bytecodealliance/wasmtime/pull/2611

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts