optimization-manual VS sb_lower_bound

Compare optimization-manual vs sb_lower_bound and see what are their differences.

optimization-manual

Contains the source code examples described in the "IntelĀ® 64 and IA-32 Architectures Optimization Reference Manual" (by intel)

sb_lower_bound

Fastest Branchless Binary Search (by mh-dm)
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
optimization-manual sb_lower_bound
3 8
738 14
1.9% -
3.8 3.9
2 months ago 10 months ago
Assembly C++
BSD Zero Clause License -
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

optimization-manual

Posts with mentions or reviews of optimization-manual. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-02.
  • Fastest Branchless Binary Search
    2 projects | /r/cpp | 2 Jul 2023
    There's two ways I vectorized linear and binary search (in practice you often want a combination, always benchmark on your real-world datasets!) - Do N binary searches simultaneously, each lane is essentially doing one bsearch. Obviously, this only works if you are doing multiple searches. - use the VPCONFLICT instruction for the linear search parts, there's even code from the Intel SDM doing it: https://github.com/intel/optimization-manual/blob/main/chap18/ex20/avx512_vector_dp.asm
  • Zen4's AVX512 Teardown
    4 projects | news.ycombinator.com | 26 Sep 2022
    The Intel optimization manual has a fun example where they use vpconflict for vectorizing sparse dot products: https://github.com/intel/optimization-manual/blob/main/chap1...

    I benchmarked it on Intel, and it was indeed quite fast/a good improvement over the scalar version. Will be interesting to try that on AMD.

  • Intel Optimization Reference Manual Code Samples
    1 project | /r/asm | 9 Jun 2021

sb_lower_bound

Posts with mentions or reviews of sb_lower_bound. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-08-11.
  • Fastest Branchless Binary Search
    1 project | /r/hackernews | 13 Aug 2023
    1 project | /r/hypeurls | 12 Aug 2023
    14 projects | news.ycombinator.com | 11 Aug 2023
    Then you'll want to look at https://mhdm.dev/posts/sb_lower_bound/#prefetching

    100mb is large enough that the branchy version turns out to have a slight advantage, more due to quirks of x86 (speculative execution) rather than being better.

    1 project | news.ycombinator.com | 8 Aug 2023
    1 project | /r/programming | 6 Jul 2023
    2 projects | /r/cpp | 2 Jul 2023
    "very similar topic" is an understatement. Funnily enough the "implementation to perform the best on Apple M1 after all micro-optimizations are applied" in the Conclusion is equivalent in terms of the how many actual comparisons are made as with sb_lower_bound. Out of curiosity I've benchmarked the two and orlp lower_bound seems to perform slightly worse: ~39ns average (using gcc) vs ~33ns average of sb_lower_bound (using clang -cmov). I'm comparing best runs for both, usual disclaimer of tested on my machine.

What are some alternatives?

When comparing optimization-manual and sb_lower_bound you can also consider the following projects:

AvxMath

ThinkingInSimd - An essay comparing performance implications of ignoring AVX acceleration

tigerbeetle - The distributed financial transactions database designed for mission critical safety and performance.

zig - General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.

amh-code - Complete implementations from "Algorithms for Modern Hardware"

Nim - Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).

rust - Empowering everyone to build reliable and efficient software.

branchless-binary-search - Binary search implementation that avoids branch instructions