Removing characters from strings faster with AVX-512

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • despacer

    C library to remove white space from strings as fast as possible

  • Cool performance enhancement, with an accompanying implementation in a real-world library (https://github.com/lemire/despacer).

    Still, what does it signal that vector extensions are required to get better string performance on x86? Wouldn't it be better if Intel invested their AVX transistor budget into simply making existing REPB prefixes a lot faster?

  • raphlinus

  • The short answer is no, but the long answer is that this is a very complex tradeoff space. Going forward, we may see more of these types of tasks moving to GPU, but for the moment it is generally not a good choice.

    The GPU is incredible at raw throughput, and this particular problem can actually implemented fairly straightforwardly (it's a stream compaction, which in turn can be expressed in terms of prefix sum). However, where the GPU absolutely falls down is when you want to interleave CPU and GPU computations. To give round numbers, the roundtrip latency is on the order of 100µs, and even aside from that, the memcpy back and forth between host and device memory might actually be slower than just solving the problem on the CPU. So you only win when the strings are very large, again using round numbers about a megabyte.

    Things change if you are able to pipeline a lot of useful computation on the GPU. This is an area of active research (including my own). Aaron Hsu has been doing groundbreaking work implementing an entire compiler on the GPU, and there's more recent work[1], implemented in Futhark, that suggests that that this approach is promising.

    I have a paper in the pipeline that includes an extraordinarily high performance (~12G elements/s) GPU implementation of the parentheses matching problem, which is the heart of parsing. If anyone would like to review a draft and provide comments, please add a comment to the GitHub issue[2] I'm using to track this. It's due very soon and I'm on a tight timeline to get all the measurements done, so actionable suggestions on how to improve the text would be most welcome.

    [1]: https://theses.liacs.nl/pdf/2020-2021-VoetterRobin.pdf

    [2]: https://github.com/raphlinus/raphlinus.github.io/issues/66#i...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts