Our great sponsors
-
simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
JSMN
Jsmn is a world fastest JSON parser/tokenizer. This is the official repo replacing the old one at Bitbucket
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
graphql-go-tools
GraphQL Router / API Gateway framework written in Golang, focussing on correctness, extensibility, and high-performance. Supports Federation v1 & v2, Subscriptions & more.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Everything you said is totally reasonable. I'm a big fan of napkin math and theoretical upper bounds on performance.
simdjson (https://github.com/simdjson/simdjson) claims to fully parse JSON on the order of 3 GB/sec. Which is faster than OP's Go whitespace parsing! These tests are running on different hardware so it's not apples-to-apples.
The phrase "cannot go faster than this" is just begging for a "well ackshully". Which I hate to do. But the fact that there is an existence proof of Problem A running faster in C++ SIMD than OP's Probably B scalar Go is quite interesting and worth calling out imho. But I admit it doesn't change the rest of the post.
Also worth looking at https://github.com/bytedance/sonic
You might want to take a look at https://github.com/ohler55/ojg. It takes a different approach with a single pass parser. There are some performance benchmarks included on the README.md landing page.
Like how https://github.com/zserge/jsmn works. I thought it would be neat to have such as parser for https://github.com/vshymanskyy/muon
Like how https://github.com/zserge/jsmn works. I thought it would be neat to have such as parser for https://github.com/vshymanskyy/muon
I also really like this paradigm. It’s just that in old crusty null-terminated C style this is really awkward because the input data must be copied or modified. But it’s not an issue when using slices (length and pointer). Unfortunately most of the C standard library and many operating system APIs expect that.
I’ve seen this referred to as a pull parser in a Rust library? (https://github.com/raphlinus/pulldown-cmark)
For json schema specifically there are some tools like go-jsonschema[1] but I've never used them personally. But you can use something like ffjson[2] in go to generate a static serialize/deserialize function based on a struct definition.
[1] https://github.com/omissis/go-jsonschema
The jsonrepair tool https://github.com/josdejong/jsonrepair might interest you. It's tailored to fix JSON strings.
I've been looking into something similar for handling partial JSONs, where you only have the first n chars of a JSON. This is common with LLM with streamed outputs aimed at reducing latency. If one knows the JSON schema ahead, then one can start processing these first fields before the remaining data has fully loaded. If you have to wait for the whole thing to load there is little point in streaming.
Was looking for a library that could do this parsing.
Obviously you can manually inline functions. That's what happened in the article.
The comment is about having a directive or annotation to make the compiler inline the function for you, which Go does not have. IMO, the pre-inline code was cleaner to me. It's a shame that the compiler could not optimize it.
There was once a proposal for this, but it's really against Go's design as a language.
https://github.com/golang/go/issues/21536
I've taken a very similar approach and built a GraphQL tokenizer and parser (amongst many other things) that's also zero memory allocations and quite fast. In case you'd like to check out the code: https://github.com/wundergraph/graphql-go-tools
Writing a json parser is definitely an educational experience. I wrote one this summer for my own purposes that is decently fast: https://github.com/nwpierce/jsb
You might also want to check out this abomination of mine: https://github.com/graph-guard/gqlscan
I've held a talk about this, unfortunately wasn't recorded. I've tried to squeeze as much out of Go as I could and I've went crazy doing that :D
Related posts
- A Journey building a fast JSON parser and full JSONPath
- Ask HN: What are some Golang tools you can't live without?
- I wrote a JSON parsing library that makes it easy to query and even do arithmetic operations on JSON.
- What is the best solution to unique data in golang
- OjG now has a tokenizer that is almost 10 times faster than json.Decode