CVE-2023-4863: Heap buffer overflow in WebP (Chrome)

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • libwebp

    Mirror only. Please do not send pull requests. See https://chromium.googlesource.com/webm/libwebp/+/HEAD/CONTRIBUTING.md.

  • The original commit in question: https://github.com/webmproject/libwebp/commit/f75dfbf23d1df1...

    The commit that fixes this bug: https://github.com/webmproject/libwebp/commit/902bc919033134...

    The original commit optimizes a Huffman decoder. The decoder uses a well-known optimization: it reads N bits in advance and determines how many bits have to be actually consumed and which symbol should be decoded, or, if it's an N-bit prefix of multiple symbols, which table should be consulted for remaining bits.

    The old version did use lookup tables for short symbols, but longer symbols needed a graph traversal. The new version improved this by using an array of lookup tables. Each entry contains (nbits, value) where `nbits` is # bits to be consumed and `value` is normally a symbol, but if `nbits` exceeds N `value` is interpreted as a table index and `nbits` is reinterpreted as the longest code length in that subtree. So each subsequent table should have `2^(nbits - N)` entries (the root table is always fixed to 2^N entries).

    The new version calculated the maximum number of entries based on the number of symbols (kTableSize). Of course, the Huffman tree comes from an untrusted source and you can easily imagine the case where `nbits` is very big. VP8 Lossless specifically allows up to 15 bits, so the largest possible table has 2^N + 2^15 entries when every single LUT is mapped to its own secondary table, and doing this doesn't need that many symbols (you only need 16-N symbols for each table). So if the Huffman tree was crafted in the way that maximizes the number of entries, it will overflow the allocation.

    To be fair, I can see why this happened; the Huffman decoding step is one of the most computationally intensive part of many compression format and any small improvement matters. The Huffman decoder optimization described above is well known, but the longer code case is commonly considered less important to optimize because longer code should rarely appear in general. The original commit message refuted this, and was able to be merged.

  • jxl.js

    JPEG XL decoder in JavaScript using WebAssembly (WASM)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • libavif

    libavif - Library for encoding and decoding .avif files

  • It's 2023, surely this is not yet another bug related to memory unsafety that could be avoided if we'd stop writing critical code that deals with extremely complex untrusted input (media codecs) in memory unsafe languages?

    Yep, of course it is: https://github.com/webmproject/libwebp/commit/902bc919033134...

    I guess libwebp could be excused as it was started when there were no alternatives, but even for new projects today we're still committing the same mistake[1][2][3].

    [1] -- https://code.videolan.org/videolan/dav1d

    [2] -- https://github.com/AOMediaCodec/libavif

    [3] -- https://github.com/AOMediaCodec/libiamf

    Yep. Keep writing these in C; surely nothing will go wrong.

  • libiamf

    Reference Software for IAMF

  • It's 2023, surely this is not yet another bug related to memory unsafety that could be avoided if we'd stop writing critical code that deals with extremely complex untrusted input (media codecs) in memory unsafe languages?

    Yep, of course it is: https://github.com/webmproject/libwebp/commit/902bc919033134...

    I guess libwebp could be excused as it was started when there were no alternatives, but even for new projects today we're still committing the same mistake[1][2][3].

    [1] -- https://code.videolan.org/videolan/dav1d

    [2] -- https://github.com/AOMediaCodec/libavif

    [3] -- https://github.com/AOMediaCodec/libiamf

    Yep. Keep writing these in C; surely nothing will go wrong.

  • Electron

    :electron: Build cross-platform desktop apps with JavaScript, HTML, and CSS

  • It does, see [0]. Fun fact: Signal desktop, which uses Electron under the hood, is running without sandbox on Linux [1][2].

    [0] https://github.com/electron/electron/pull/39824

    [1] https://github.com/signalapp/Signal-Desktop/issues/5195

    [2] https://github.com/signalapp/Signal-Desktop/pull/4381

  • Signal-Desktop

    A private messenger for Windows, macOS, and Linux.

  • It does, see [0]. Fun fact: Signal desktop, which uses Electron under the hood, is running without sandbox on Linux [1][2].

    [0] https://github.com/electron/electron/pull/39824

    [1] https://github.com/signalapp/Signal-Desktop/issues/5195

    [2] https://github.com/signalapp/Signal-Desktop/pull/4381

  • l4v

    seL4 specification and proofs

  • You can't really retrofit safety to C. The best that can be achieved is sel4, which while it is written in C has a separate proof of its correctness: https://github.com/seL4/l4v

    The proof is much, much more work than the microkernel itself. A proof for something as large as webP might take decades.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • wuffs

    Wrangling Untrusted File Formats Safely

  • I agree that Wuffs [1] would have been a very good alternative! If it can be made more generally. AFAIK Wuffs is still very limited, in particular it never allows dynamic allocation. Many formats, including those supported by Wuffs the library, need dynamic allocation, so Wuffs code has to be glued with unverified non-Wuffs code [2]. This only works with simpler formats.

    [1] https://github.com/google/wuffs/blob/main/doc/wuffs-the-lang...

    [2] https://github.com/google/wuffs/blob/main/doc/note/memory-sa...

  • kani

    Kani Rust Verifier

  • > those applications need the proof for correctness so that more dangerous code---say, what would need `unsafe` in Rust---can be safely added

    There are actually already tools built for this very purpose in Rust (see Kani [1] for instance).

    Formal verification has a serious scaling problem, so forming programs in such a way that there are a few performance-critical areas that use unsafe routines seems like the best route. I feel like Rust leans into this paradigm with `unsafe` blocks.

    [1] - https://github.com/model-checking/kani

  • BrowserBoxPro

    Discontinued :cyclone: BrowserBox is Web application virtualization via zero trust remote browser isolation and secure document gateway technology. Embed secure unrestricted webviews on any device in a regular webpage. Multiplayer embeddable browsers, open source! [Moved to: https://github.com/BrowserBox/BrowserBox]

  • Agree. This is one of the reasons it's better to go with older and more reliable JPEG for viewport streaming. An exploit chain would need to penetrate screen capture images to pass to the client. Browser zero days do occur and this is why it's important to have protection. For added protection consider browser isolation. Check out open source Zero Trust browser isolation at BrowserBox using JPEG (now WebP) now: https://github.com/dosyago/BrowserBoxPro

    Technically, we did try using WebP due to its significant bandwidth gains. However, the compute overhead for encoding versus JPEG introduced unacceptable latency into our streaming pipeline, so for now, we're still against it. Security is an additional mark against the newer standard, as good as it is!

  • image

    Encoding and decoding images in Rust (by image-rs)

  • FTR there is a WebP decoder implementation in safe Rust in the image crate: https://github.com/image-rs/image

    It used to be quite incomplete for a long time, but work last year has implemented many webp features. Chromium now has a policy of allowing the use of Rust dependencies, so maybe Chromium could start adopting it?

  • ZLib

    A massively spiffy yet delicately unobtrusive compression library.

  • So the real issue here is that the lack of tree validation before the tree construction, I believe. I'm surprised that this check was not yet implemented (I actually checked libwebp to make sure that I was missing one). Given this blind spot, an automated test based on the domain knowledge is likely useless to catch this bug.

    [1] https://github.com/madler/zlib/blob/master/examples/enough.c

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts