Show HN: High-speed UTF-8 validation in Rust

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • encoding_rs

    A Gecko-oriented implementation of the Encoding Standard in Rust

    That's not the only use of SIMD in the crate (e.g. see https://github.com/hsivonen/encoding_rs/blob/e98a2096ab09c92...), but I haven't looked into exactly where/how it's used further.

  • simdutf8

    SIMD-accelerated UTF-8 validation for Rust.

    Check the benchmarks section (https://github.com/rusticstuff/simdutf8#Benchmarks), second table. simdutf8 is up to 28 % faster on my Comet Lake CPU. However with pure ASCII clang does something magical with simdjson and it beats my implementation by a lot. GCC-compiled simdjson is slower all around except for a few outliers with short byte sequences.

    The algorithm is the one from simdjson, the main difference is that it uses an extra step in the beginning to align reads to the SIMD block size.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • simdjson

    Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

    If someone is wanting to make a code comparison, most of the UTF-8 validation in simdjson seems to be nicely summed up in this pull request that sped it up: https://github.com/simdjson/simdjson/pull/993/files

  • sqloxide

    Python bindings for sqlparser-rs

    Yes, from the python side of things there are tools like py03 that make integrating rust into python code really painless.

    I have a sql parsing library (shameless plug) that is 50x faster than any other python implementation, it is a super simple wrapper around a rust crate.

    https://github.com/wseaton/sqloxide

  • cxx

    Safe interop between Rust and C++

    Yes, Rust natively can expose C ffi functions (for example).

    The https://cxx.rs/ project is also a major crate for C++ interoperability.

  • fontdue

    The fastest font renderer in the world, written in pure rust.

    I work on a SIMD optimized font library [0] and have stumbled into the same situation of hand writing SIMD intrinsics. Some things are just kinda hard to make sure they get optimized correctly, and there is enough difference between the platforms where that matters when fiddling with bits. I also kinda have fun writing SIMD code like this too.

    [0]: https://github.com/mooman219/fontdue/blob/master/src/platfor...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts