Building a high performance JSON parser

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • simdjson

    Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

  • Everything you said is totally reasonable. I'm a big fan of napkin math and theoretical upper bounds on performance.

    simdjson (https://github.com/simdjson/simdjson) claims to fully parse JSON on the order of 3 GB/sec. Which is faster than OP's Go whitespace parsing! These tests are running on different hardware so it's not apples-to-apples.

    The phrase "cannot go faster than this" is just begging for a "well ackshully". Which I hate to do. But the fact that there is an existence proof of Problem A running faster in C++ SIMD than OP's Probably B scalar Go is quite interesting and worth calling out imho. But I admit it doesn't change the rest of the post.

  • sonic

    A blazingly fast JSON serializing & deserializing library (by bytedance)

  • Also worth looking at https://github.com/bytedance/sonic

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • ojg

    Optimized JSON for Go

  • You might want to take a look at https://github.com/ohler55/ojg. It takes a different approach with a single pass parser. There are some performance benchmarks included on the README.md landing page.

  • JSMN

    Jsmn is a world fastest JSON parser/tokenizer. This is the official repo replacing the old one at Bitbucket

  • Like how https://github.com/zserge/jsmn works. I thought it would be neat to have such as parser for https://github.com/vshymanskyy/muon

  • muon

    µON - a compact and simple binary object notation (by vshymanskyy)

  • Like how https://github.com/zserge/jsmn works. I thought it would be neat to have such as parser for https://github.com/vshymanskyy/muon

  • pulldown-cmark

    An efficient, reliable parser for CommonMark, a standard dialect of Markdown

  • I also really like this paradigm. It’s just that in old crusty null-terminated C style this is really awkward because the input data must be copied or modified. But it’s not an issue when using slices (length and pointer). Unfortunately most of the C standard library and many operating system APIs expect that.

    I’ve seen this referred to as a pull parser in a Rust library? (https://github.com/raphlinus/pulldown-cmark)

  • go-jsonschema

    A tool to generate Go data types from JSON Schema definitions.

  • For json schema specifically there are some tools like go-jsonschema[1] but I've never used them personally. But you can use something like ffjson[2] in go to generate a static serialize/deserialize function based on a struct definition.

    [1] https://github.com/omissis/go-jsonschema

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ffjson

    faster JSON serialization for Go

  • jsoncut

  • benchmarks

    Some benchmarks of different languages

  • jsonrepair

    Repair invalid JSON documents

  • The jsonrepair tool https://github.com/josdejong/jsonrepair might interest you. It's tailored to fix JSON strings.

    I've been looking into something similar for handling partial JSONs, where you only have the first n chars of a JSON. This is common with LLM with streamed outputs aimed at reducing latency. If one knows the JSON schema ahead, then one can start processing these first fields before the remaining data has fully loaded. If you have to wait for the whole thing to load there is little point in streaming.

    Was looking for a library that could do this parsing.

  • go

    The Go programming language

  • Obviously you can manually inline functions. That's what happened in the article.

    The comment is about having a directive or annotation to make the compiler inline the function for you, which Go does not have. IMO, the pre-inline code was cleaner to me. It's a shame that the compiler could not optimize it.

    There was once a proposal for this, but it's really against Go's design as a language.

    https://github.com/golang/go/issues/21536

  • graphql-go-tools

    GraphQL Router / API Gateway framework written in Golang, focussing on correctness, extensibility, and high-performance. Supports Federation v1 & v2, Subscriptions & more.

  • I've taken a very similar approach and built a GraphQL tokenizer and parser (amongst many other things) that's also zero memory allocations and quite fast. In case you'd like to check out the code: https://github.com/wundergraph/graphql-go-tools

  • jsb

    Fast json <=> binary serializer library for C

  • Writing a json parser is definitely an educational experience. I wrote one this summer for my own purposes that is decently fast: https://github.com/nwpierce/jsb

  • json5-spec

    The JSON5 Data Interchange Format

  • Visual Studio Code

    Visual Studio Code

  • gqlscan

    GraphQL lexical scanner for Go

  • You might also want to check out this abomination of mine: https://github.com/graph-guard/gqlscan

    I've held a talk about this, unfortunately wasn't recorded. I've tried to squeeze as much out of Go as I could and I've went crazy doing that :D

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts