Tolower() with AVX-512

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. U8String

    [work-in-progress] Highly functional and performant UTF-8 string primitive for C#

    Mask add looks neat! I wish there was a way to directly manipulate AVX512's mask registers in .NET intrinsics but for now we have to live with "recognized idioms".

    Some months ago I wrote a similar ASCII in UTF-8 upcase/downcase implementation: https://github.com/U8String/U8String/blob/main/Sources/U8Str...

    (the unrolled conversion for below vectorization lengths is required as short strings dominate most codebases so handling it fast is important - the switch compiles to jump table and then branchless fall-through to return)

    For now it goes as wide as 256b as it already saturates e.g. Zen 3 or 4 which have only 256x4 SIMD units (even though Zen 4 can do fancy 512b shuffles natively and has very good 512b implementation). The core compiles to:

                  cmp      rdx, 32

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. scummvm

    ScummVM main repository

    "SIMD Within A Register"

    I think the implication is that you can pack multiple items into an ordinary register and effectively get SIMD even if you aren't using explicit SIMD instructions. E.g. if you pack a 31 and 32 bit number into a 64 bit register (you need 1 spare for a carry bit), you can do 2 adds with a single 64-bit add.

    Games have used these tricks for graphics to pack RGB(A) values into 32 bit integers. E.g. this code from scummvm interpolates 2 16-bit RGB pixels (6 total components) packed into a 32-bit value. https://github.com/scummvm/scummvm/blob/master/graphics/scal...

  4. uwu

    fastest text uwuifier in the west

  5. charcoal

    Faster utf8.Valid using multi-byte processing without SIMD. (by sugawarayuuta)

    Unfortunately those SIMD optimizations are only useful for strings that are aligned on 8 bytes address.

    If your SIMD algorithm is applied on a non-aligned string, it is often slower than the original algorithm.

    And splitting the algorith in 3 parts (handling the beginning up to an aligned address, then the aligned part, and then the less-than-8-bytes tail) takes even more instructions.

    Here is a similar case on a false claim of a faster utf8.IsValid in Go, with benchmarks: https://github.com/sugawarayuuta/charcoal/pull/1

  6. gxhash

    Unsafely fast hashing algorithm 📈

    There's a debate on how unsafe/unsound this technique actually is. https://github.com/ogxd/gxhash/issues/82

    I definitely see the conundrum since the dangerous code is such a huge performance gain.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Beyond fmt: Building a Flexible Text Builder in Go with Wissance/stringFormatter

    1 project | dev.to | 9 Mar 2026
  • Go structs are copied on assignment (and other things about Go I'd missed)

    4 projects | news.ycombinator.com | 11 Aug 2024
  • Format a text in GO better than fmt

    1 project | dev.to | 14 Jan 2024
  • A GC-Friendly Go Interning Cache

    2 projects | /r/golang | 9 Nov 2022
  • Make you Go code work 1.5x faster OR even more

    1 project | dev.to | 11 Oct 2022

Did you know that Rust is
the 3rd most popular programming language
based on number of references?