Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Ryū algorithm, the converse (doubles to strings), is also much faster than using Java's number formatting classes.
https://github.com/ulfjack/ryu/blob/master/src/main/java/inf...
The successes of fuzzing projects like oss-fuzz have demonstrated significant shortcomings to hand-curating test cases in the manner you describe. Testing every 64bit float value is unrealistic, but testing a huge number of randomly selected values by cross-comparison with other libraries is a very good idea for code like this.
https://github.com/google/oss-fuzz
Slightly more complex. The distribution is based on how often the type is likely to occur in your average document.
The type field is almost always 1 byte long, except for typed arrays (which would in aggregate be long enough to offset the cost) and two uncommon types/variations. These have an effective 2-byte type field (0x94 selects a secondary type plane, and the next byte selects the specific type from there).
The most common integer values (-100 to 100) are encoded directly into the types 0x9c to 0x64 (wraparound), such that interpreting them directly as signed 8-bit integers yields their actual value (type 0x00 = integer 0, type 0x64 = integer 100, type 0x9c = -100, etc).
Strings are also optimized such that types 0x80 - 0x8f are used for the most common string lengths (0 to 15) so as not to require a separate length field.
The rest have a 1-byte type field. There are also 3 reserved type codes left in the first plane in case something big comes up in the future. You can see the chart here: https://github.com/kstenerud/concise-encoding/blob/master/cb...