-
Stream VByte[1,2,3] is not far away from that idea, storing the packed lengths in a separate stream from the data bytes. (There’s also a more conventional interleaved design in varint-G8IU, but Stepanov decided to fuck everyone over and patented that.)
[1] https://lemire.me/blog/2017/09/27/stream-vbyte-breaking-new-...
[2] https://arxiv.org/abs/1709.08990
[3] https://github.com/lemire/streamvbyte
-
JetBrains
Tell us how you use coding tools. You may win a prize! Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes!
-
Unary length prefix is a solid technique. I would be careful with the performance claims, though. Sometimes what the machine can do is surprising and the performance of a system that looks like it needs to loop and branch is faster than you expected. The protobuf project, unsurprisingly for a project of its age, has several different varint parsing strategies. One is described at https://github.com/protocolbuffers/protobuf/blob/main/src/go... and another at https://github.com/protocolbuffers/protobuf/blob/main/src/go...