Heap memory corruption CVE for GitHub's Markdown table parsing

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • cmark-gfm

    GitHub's fork of cmark, a CommonMark parsing and rendering library and program in C

    This seems to be the patch: https://github.com/github/cmark-gfm/commit/cf7577d2f74289cb8...

    Integer overflow can happen in Rust, but it's well-defined, not undefined.

    Bounds checking is part of indexing, and so even if an index overflows, the check should happen.

    "impossible" is a strong word, but it would be significantly less likely in Rust. If you did the same thing as you did in C, with unsafe, then it could happen. But there's not a lot of reason to 99.9999% of the time, as it's the more difficult and less ergonomic option.

  • rust

    Empowering everyone to build reliable and efficient software.

    Rust is too much of a low level language for that. In Rust, operations typically map straightforwardly to system operations, and the layer on top is pretty minimal.

    There's some exceptions though. Rust puts a mutex for accessing stdio (to prevent interleaved, broken output when calling println!() from many threads). Rust also has a mutex for accessing the environment [0]

    But in this case... see, this /proc thing is pretty niche. With some bizarre combinations of brokenness (either on your application, or on the system setup, or both) it can indeed lead to UB, which can be a serious security bugs, remote code execution even, but it's very, very rare that some program needs to care about this in practice. Rust is a practical language and I think in this case the line it drew was quite sensible.

    Sadly, this means that Rust safety guarantees aren't absolute, but Rust doesn't even have yet a precise mathematical definition of UB anyways, so we can't even in principle start formalizing this enough for this to matter.

    Rust is also in a though spot here because as I said, there is hardly a foolproof way to check at runtime whether you're accessing procfs: checking for /proc in the path is just an heuristic that doesn't actually close the UB loophole. You would need to inspect all mounts to check for procfs and bind mounts (and also check for symlinks, hardlinks, etc),

    Maybe another route is to open the file as normal, but then do a stat and check the device and inode: if the device is a procfs device, and the inode is a bad one, return an error (if you want to open it anyway, you need to use an unsafe API). Or, if we can't check because some system setup shenanigans, default to returning an error. This could be an useful crate for a sufficiently paranoid application, but might not make the cut for the Rust stdlib, even though it ostensibly closes a safety loophole. (or it just might; maybe this should be proposed)

    [0] Unlike the stdio one, the environment mutex is actually critical for safety in Rust programs. But you can break this safety by calling C code that reads the environment in a non-threadsafe manner without passing through the Rust mutex. So, accessing the environment from many threads can easily lead to UB, even though the operation is marked as safe in Rust. This can still be sound from Rust's pov because calling C APIs is unsafe, so you "just" need to guarantee that all C code isn't accessing the env behind your back. Except that there's some bad APIs like getaddrinfo that may implicitly access the environment, and tons of libraries call that, so in practice many C libraries can't be given a safe Rust interface. See https://doc.rust-lang.org/std/env/fn.set_var.html and https://internals.rust-lang.org/t/synchronized-ffi-access-to... and https://github.com/rust-lang/rust/issues/27970

    Note that for many well behaved programs, environment variables are read only at program startup (before creating any threads), saved to a config struct, and this struct is passed around as needed. This usage can be safe even without the mutex. So an alternative design would be to prevent calling some APIs once you have created threads. To do that you maybe could add some way of tracking at type level whether the program is single threaded or multi-threaded (perhaps with session types, or the typestate pattern: basically a type-level state machine, where spawning a thread makes you go to the multi-threaded state if you're not already there). Also, in the single-threaded state, Arc could be automatically converted to Rc as well, Mutex converted to RefCell, etc. It would be interesting to see a language designed around this.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • wuffs

    Wrangling Untrusted File Formats Safely

    Much worse than that, even memory-safe languages like (safe) Rust, and the inevitable suggestion of AUTOSAR and so on aren't the answer. To properly answer your demand for a "practical way to ensure user input does not cause mischief" you want a drastically less capable language which cannot even in principle express the programs that should not exist, that's exactly what WUFFS is for.

    https://github.com/google/wuffs

    This sort of bug can't happen in WUFFS because you can't express the idea "corrupt the heap memory" even if you desperately wanted to. The tell-tale sign of such languages is that they are not general purpose languages, because those are able to express a wide variety of stupid things you don't want to do.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts