Fork() is evil; vfork() is goodness; afork() would be better; clone() is stupid

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • ninja

    a small build system with a focus on speed

  • In Ninja, which needs to spawn a lot of subprocesses but it otherwise not especially large in memory and which doesn't use threads, we moved from fork to posix_spawn (which is the "I want fork+exec immediately, please do the smartest thing you can" wrapper) because it performed better on OS X and Solaris:

    https://github.com/ninja-build/ninja/commit/89587196705f54af...

  • rust

    Empowering everyone to build reliable and efficient software.

  • > clone() is stupid ... the clone(2) design, or its maintainers, encourages a proliferation of flags, which means one must constantly pay attention to the possible need to add new flags at existing call sites.

    IMHO a bigger problem in practice with clone is that (according to glibc maintainers) once your program calls it, you can't call any glibc function anymore. [1] Essentially the raw syscall is a tool for the libc implementation to use. The libc implementation hasn't provided a wrapper for programs to use which maintains the libc's internal invariants about things like (IIUC) thread-local storage for errno.

    [1] https://github.com/rust-lang/rust/issues/89522#issuecomment-...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • source-singularity

    Discontinued Clone of the MSR Singularity Project

  • Windows doesn't have fork as you know it. It has a POSIX-ish fork-alike for compliance, but under the hood it's CreateThread[0] with some Magic.

    in Windows, you create the thread with CreateThread, then are passed back a handle to that thread. You then can query the state of the thread using GetExitCodeThread[1] or if you need to wait for the thread to finish, you call WaitForSingleObject [2] with an Infinite timeout

    Aside: WaitForSingleObject is how you track a bunch of stuff: semaphores, mutexes, processes, events, timers, etc.

    The flipside of this is that Windows processes are buckets of handles: a Process object maintains a series of handles to (threads, files, sockets, WMI meters, etc), one of which happens to be the main thread. Once the main thread exits, the system goes back and cleans up (as it can) the rest of the threads. This is why sometimes you can get zombie'd processes holding onto a stuck thread.

    This is also how it's a very cheap operation to interrogate what's going on in a process ala Process Explorer.

    If I had to describe the difference between Windows and Linux at a process model level, I have to back up to the fundamental difference between the Linux and Windows programming models: Linux is is a kernel that has to hide its inner workings for its safety and security, passing wrapped versions of structures back and forth through the kernel-userspace boundary; Windows is a kernel that considers each portion of its core separated, isolated through ACLs, and where a handle to something can be passed around without worry. The windows ABI has been so fundamentally stable over 30 years now because so much of it is built around controlling object handles (which are allowed to change under the hood) rather than manipulation of of kernel primitives through syscalls.

    Early WinNT was very restrictive and eased up a bit as development continued so that win9x software would run on it under the VDM. Since then, most windows software insecurities are the result of people making assumptions about what will or won't happen with a particular object's ACL.

    There's a great overview of windows programming over at [3]. It covers primarily Win32, but gets into the NT kernel primitives and how it works.

    A lot of work has gone into making Windows an object-oriented kernel; where Linux has been looking at C11 as a "next step" and considering if Rust makes sense as a kernel component, Windows likely has leftovers of Midori and Singularity [4] lingering in it that have gone onto be used for core functionality where it makes sense.

    [0] https://docs.microsoft.com/en-us/windows/win32/api/processth... [1] https://docs.microsoft.com/en-us/windows/win32/api/processth... [2] https://docs.microsoft.com/en-us/windows/win32/api/synchapi/... [3] https://www.tenouk.com/cnwin32tutorials.html [4] https://www.microsoft.com/en-us/research/project/singularity...

  • rr

    Record and Replay Framework

  • > I'm curious to hear more. What's its purpose?

    Sure! I'll try to illustrate the general idea, though I'm taking liberties with a few of the details to keen things simple(r).

    Our software (see https://undo.io) does record and replay (including the full set of Time Travel Debug stuff - executing backwards, etc) of Linux processes. Conceptually that's similar to `rr` (see https://rr-project.org/) - the differences probably aren't relevant here.

    We're using `ptrace` as part of monitoring process behaviour (we also have in-process instrumentation). This reflects our origins in building a debugger - but it's also because `ptrace` is just very powerful for monitoring a process / thread. It is a very challenging API to work with, though.

    One feature / quirk of `ptrace` is that you can't really do anything useful with a traced thread that's currently running - including peeking its memory. So if a program we're recording is just getting along with its day we can't just examine it whenever we want.

    First choice is just to avoid messing with the process but sometimes we really do need to interact with it. We could just interrupt a thread, use `ptrace` to examine it, then start it up again. But there's a problem - in the corners of Linux kernel behaviour there's a risk that this will have a program-visible side effect. Specifically, you might cause a syscall restart not to happen.

    So when we're recording a real process we need something that:

    * acts like a thread in the process - so we can peek / poke its memory, etc via ptrace

  • tensorflow

    An Open Source Machine Learning Framework for Everyone

  • We had the case that some library we were using (OpenBLAS) used pthread_atfork. Unfortunately, the atfork handler behaved buggy in certain situations involving multiple threads and caused a crash. This was annoying because we basically did not need fork at all but just fork+exec (for various other libraries spawning sub processes), where those atfork handlers would not be relevant.

    Our solution was to override pthread_atfork to ignore any functions, and in case this is not enough, also fork itself to just directly do the syscall without calling the atfork handlers.

    https://github.com/tensorflow/tensorflow/issues/13802

  • OpenBLAS

    OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

  • prefork

    a utility to prefork inetd-style wait services.

  • You can vfork()+exec(), why not? Exec too expensive? You can prefork[0].

      [0] https://github.com/elric1/prefork

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • linux

    Linux kernel source tree

  • Practically, this is the struct you have to fill in if you don't use clone or fork.

    https://github.com/torvalds/linux/blob/719fce7539cd3e186598e...

    IMO clone looks a lot better than screwing with that giant struct and all of the kernel bugs that would exist from validating every goofy way those options could be setup wrong by user space.

  • popen-noshell

    A much faster popen() and system() implementation for Linux

  • As u/amaranth pointed out, my gist predates the MSFT paper, which mostly explains why I didn't reference. Though, to be fair, I saw that paper posted here back in 2019, and I commented on it plenty (13 comments) then. I could have edited my gist to reference it, and, really, probably should have. Sometime this week I will add a reference to it, as well as this and that HN post, since they are clearly germane and useful threads.

    I vehemently disagree with those who say that vfork() is much more difficult to use correctly than fork(). Neither is particularly easy to use though. Both have issues to do with, e.g., signals. posix_spawn() is not exactly trivial to use, but it is easier to use it correctly than fork() or vfork(). And posix_spawn() is extensible -- it is not a dead end.

    My main points are that vfork() has been unjustly vilified, fork() is really not good, vfork() is better than fork(), and we can do better than vfork(). That said, posix_spawn() is the better answer whenever it's applicable.

    Note that the MSFT paper uncritically accepts the idea that vfork() is dangerous. I suspect that is because their focus was on the fork-is-terrible side of things. Their preference seems to be for spawn-type APIs, which is reasonable enough, so why bother with vfork() anyways, right? But here's the thing: Windows WSL can probably get a vfork() added easily enough, and replacing fork() with vfork() will generally be a much simpler change than replacing fork() with posix_spawn(), so I think there is value in vfork() for Microsoft.

    Use cases for vfork() or afork()? Wherever you're using fork() today to then exec, vfork() will make that code more performant and it generally won't take too much effort to replace the call to fork() with vfork(). afork() is for apps that need to spawn lots of processes quickly -- these are rare apps, but uses for them do arise from time to time. But also, afork() should be easier to use safely than vfork(). And, again, for Microsoft there is value in vfork() as a smaller change to Linux apps so they can run well in WSL.

    BTW, see @famzah's popen-noshell issue #11 [0] for a high-perf spawn use case. I linked it from my gist, and, in fact, the discussion there led directly to my writing that gist.

      [0] https://github.com/famzah/popen-noshell/issues/11

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts