Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Branchless or not in this case, it still touches memory in not so good pattern. I found that a significant speedup of a classic BS could be achieved by switching to linear SIMD search when the remaining range has a width of 3-4 SIMD lines (or maybe even a little more). The bounds of that range are likely already touched and in cache, then prefetching helps. It gives 30-50% gain on 1K items array of integers, 10-25% on 1M items, depending on data distribution. Here is an example in C#: https://github.com/Spreads/Spreads/blob/main/src/Spreads.Cor...
Shameless plug of my attempt at this: https://github.com/ehrmann/branchless-binary-search