Parsing can become accidentally quadratic because of sscanf

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • rapidyaml

    Rapid YAML - a library to parse and emit YAML, and do it fast.

  • json-c

    https://github.com/json-c/json-c is the official code repository for json-c. See the wiki for release tarballs for download. API docs at http://json-c.github.io/json-c/

  • I found this while making a collection of what C implementation does what at https://news.ycombinator.com/item?id=26298300 .

    There are two basic implementation strategies. The BSD (FreeBSD and OpenBSD and more than likely NetBSD too), Microsoft, GNU, and MUSL C libraries use one, and suffer from this; whereas the OpenWatcom, P.J. Plauger, Tru64 Unix, and my standard C libraries use another, and do not.

    The 2002 report in the comp.lang.c Usenet newsgroup (listed in that discussion) is the earliest that I've found so far.

    There have been several assertions that library support for strings in C++ is poor. However, note that the fix here was to switch from the Standard C library function sscanf() to a C++ library.

    * https://github.com/fastfloat

    One might advocate using other people's libraries, on the grounds that they will have got such things right. However, note that this is RapidYAML, and the bug got fixed several years after GTA Online was released. Other third-party parsing libraries have suffered from being layered on top of sscanf() in other ways.

    * https://github.com/json-c/json-c/issues/173

    * https://github.com/kgabis/parson/commit/96150ba1fd7f3398aa6a...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • degasolv

    Democratize dependency management.

  • I ran up against this issue when I was creating a parser that parsed debian package information (code here[1]). Trying to match against a line with regex slowed the program down past feasibility. I had to switch to just assuming that two line breaks meant a new package record instead. After that, the program was zippy.

    1: https://github.com/djhaskin987/degasolv/blob/develop/cli-src...

  • glibc

    Unofficial mirror of sourceware glibc repository. Updated daily. (by bminor)

  • I am confused about this comment. The parent comment gave clear stacktrace that pointing to the problem where the glibc's sscanf implementation would strlen the input string no matter what format string you use (because it need to masquerade as a file and that requires strlen the whole input: https://github.com/bminor/glibc/blob/21c3f4b5368686ade28d90d...)

    All your magic on format string doesn't change that poor behavior unless you manually add '\0' somewhere to null-terminate the input string early.

  • go

    The Go programming language

  • I found the issue: https://github.com/golang/go/issues/6189

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts