FastDoubleParser: Java port of Daniel Lemires fast_double_parser

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • FastDoubleParser

    A Java port of Daniel Lemire's fast_float project

  • ryu

    Converts floating point numbers to decimal strings (by ulfjack)

  • Ryū algorithm, the converse (doubles to strings), is also much faster than using Java's number formatting classes.

    https://github.com/ulfjack/ryu/blob/master/src/main/java/inf...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • oss-fuzz

    OSS-Fuzz - continuous fuzzing for open source software.

  • The successes of fuzzing projects like oss-fuzz have demonstrated significant shortcomings to hand-curating test cases in the manner you describe. Testing every 64bit float value is unrealistic, but testing a huge number of randomly selected values by cross-comparison with other libraries is a very good idea for code like this.

    https://github.com/google/oss-fuzz

  • concise-encoding

    The secure data format for a modern world

  • Slightly more complex. The distribution is based on how often the type is likely to occur in your average document.

    The type field is almost always 1 byte long, except for typed arrays (which would in aggregate be long enough to offset the cost) and two uncommon types/variations. These have an effective 2-byte type field (0x94 selects a secondary type plane, and the next byte selects the specific type from there).

    The most common integer values (-100 to 100) are encoded directly into the types 0x9c to 0x64 (wraparound), such that interpreting them directly as signed 8-bit integers yields their actual value (type 0x00 = integer 0, type 0x64 = integer 100, type 0x9c = -100, etc).

    Strings are also optimized such that types 0x80 - 0x8f are used for the most common string lengths (0 to 15) so as not to require a separate length field.

    The rest have a 1-byte type field. There are also 3 reserved type codes left in the first plane in case something big comes up in the future. You can see the chart here: https://github.com/kstenerud/concise-encoding/blob/master/cb...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts