optimization-manual vs sb_lower_bound

optimization-manual

Contains the source code examples described in the "Intel® 64 and IA-32 Architectures Optimization Reference Manual" (by intel)

Suggest topics

Source Code

Suggest alternative

Edit details

sb_lower_bound

Fastest Branchless Binary Search (by mh-dm)

lower-bound

Source Code

mhdm.dev

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

optimization-manual		sb_lower_bound
	Project
3	Mentions	8
738	Stars	14
1.9%	Growth	-
3.8	Activity	3.9
2 months ago	Latest Commit	10 months ago
Assembly	Language	C++
BSD Zero Clause License	License	-

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

optimization-manual

Posts with mentions or reviews of optimization-manual. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-02.

Fastest Branchless Binary Search
2 projects | /r/cpp | 2 Jul 2023

There's two ways I vectorized linear and binary search (in practice you often want a combination, always benchmark on your real-world datasets!) - Do N binary searches simultaneously, each lane is essentially doing one bsearch. Obviously, this only works if you are doing multiple searches. - use the VPCONFLICT instruction for the linear search parts, there's even code from the Intel SDM doing it: https://github.com/intel/optimization-manual/blob/main/chap18/ex20/avx512_vector_dp.asm
Zen4's AVX512 Teardown
4 projects | news.ycombinator.com | 26 Sep 2022

The Intel optimization manual has a fun example where they use vpconflict for vectorizing sparse dot products: https://github.com/intel/optimization-manual/blob/main/chap1...
I benchmarked it on Intel, and it was indeed quite fast/a good improvement over the scalar version. Will be interesting to try that on AMD.
Intel Optimization Reference Manual Code Samples
1 project | /r/asm | 9 Jun 2021

sb_lower_bound

Posts with mentions or reviews of sb_lower_bound. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-08-11.

Fastest Branchless Binary Search
1 project | /r/hackernews | 13 Aug 2023

1 project | /r/hypeurls | 12 Aug 2023

14 projects | news.ycombinator.com | 11 Aug 2023

Then you'll want to look at https://mhdm.dev/posts/sb_lower_bound/#prefetching
100mb is large enough that the branchy version turns out to have a slight advantage, more due to quirks of x86 (speculative execution) rather than being better.

1 project | news.ycombinator.com | 8 Aug 2023

1 project | /r/programming | 6 Jul 2023

2 projects | /r/cpp | 2 Jul 2023

"very similar topic" is an understatement. Funnily enough the "implementation to perform the best on Apple M1 after all micro-optimizations are applied" in the Conclusion is equivalent in terms of the how many actual comparisons are made as with sb_lower_bound. Out of curiosity I've benchmarked the two and orlp lower_bound seems to perform slightly worse: ~39ns average (using gcc) vs ~33ns average of sb_lower_bound (using clang -cmov). I'm comparing best runs for both, usual disclaimer of tested on my machine.

What are some alternatives?

When comparing optimization-manual and sb_lower_bound you can also consider the following projects:

AvxMath

ThinkingInSimd - An essay comparing performance implications of ignoring AVX acceleration

tigerbeetle - The distributed financial transactions database designed for mission critical safety and performance.

zig - General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.

amh-code - Complete implementations from "Algorithms for Modern Hardware"

Nim - Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).

rust - Empowering everyone to build reliable and efficient software.

branchless-binary-search - Binary search implementation that avoids branch instructions