Linux /proc/pid/stat parsing bugs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • procfs

    Rust library for reading the Linux procfs filesystem (by eminence)

  • I've been working on a library[1] that aims to have fairly complete support for the procfs filesystem, so that you can hide away these annoying parsing quirks. But for some casual usage of /proc/ where you only need one tiny bit of information, it's often better to just roll your own parser instead of bringing in a 3rd party library. It's these small one-off cases that would really benefit from a standardized serialization format like you propose.

    [1] https://github.com/eminence/procfs

  • pixie

    Instant Kubernetes-Native Application Observability

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • procmaps.rs

    A small Rust library for reading process maps from procfs

  • The /proc//* hierarchy has always been a bit of a mess to parse.

    /proc//maps is similarly frustrating: there's no clear distinction between "special" maps (like the stack) and a file that might just happen to be named `[stack]`. Similarly, the handling for a mapped region on a deleted file is simply to append " (deleted)"[1].

    [1]: https://github.com/woodruffw/procmaps.rs/blob/79bd474104e9b3...

  • jc

    CLI tool and python library that converts the output of popular command-line tools, file-types, and common strings to JSON, YAML, or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts.

  • Neat! Your parser [1] almost has a similar issue because a comm could contain parenthesis, e.g., `foo) R 123 456`. But since a comm is limited to 64 bytes, I don't think it is possible to fit a fully matching string inside of the comm before the closing parent after the comm, which would thus make your regexp fail to match.

    [1] https://github.com/kellyjonbrazil/jc/blob/master/jc/parsers/...

  • psmisc

  • simdjson

    Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

  • JSON can be parsed very quickly: https://github.com/simdjson/simdjson

    CBOR could be another option: https://en.wikipedia.org/wiki/CBOR

  • Git

    Git Source Code Mirror - This is a publish-only repository but pull requests can be turned into patches to the mailing list via GitGitGadget (https://gitgitgadget.github.io/). Please follow Documentation/SubmittingPatches procedure for any of your improvements.

  • I noticed this around a year ago when writing a /proc/paid/stat parser for git (for logging the chain of parent processes).

    Here's that commit, it has a comment with an overview of the kernel limits and caveats involved: https://github.com/git/git/commit/2d3491b117c6dd08e431acc390...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts