Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
grep is the beginning, not the end. it’s a great performance baseline to meet, and then beat[1]. computers are insanely fast!
the startups using grep on aws are undercutting those doing slower things on aws. this must be why aws architects never talk about grep.
1. https://github.com/nathants/bsv
I don’t really understand what this is trying to prove:
- you don’t seem to specify the size of the input. This is the most important omission
- you are constructing an optimised representation (in this case, strict with fields in the right places) instead of a generic ‘dumb’ representation that is more like a tree of python dicts
- rust is not a ‘moderately fast language’ imo (though this is not a very important point. It’s more about how optimised the parser is, and I suspect that serde_json is written in an optimised way, but I didn’t look very hard).
I found[1], which gives serde_json to a dom 300-400MB/s on a somewhat old laptop cpu. A simpler implementation runs at 100-200, a very optimised implementation gets 400-800. But I don’t think this does that much to confirm what I said in the comment you replied to. The numbers for simd json are a bit lower than I expected (maybe due to the ‘dom’ part). I think my 50MB/a number was probably a bit off but maybe the python implementation converts json to some C object and then converts that C object to python objects. That might half your throughput (my guess is that this is what the ‘strict parse’ case for rustc_serialise is roughly doing).
[1] https://github.com/serde-rs/json-benchmark