Changing std:sort at Google’s Scale and Beyond

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • xeus-cling

    Jupyter kernel for the C++ programming language

  • awesome-algorithms

    A curated list of awesome places to learn and/or practice algorithms.

  • https://github.com/tayllan/awesome-algorithms#github-librari...

    awesome-theoretical-computer-science > Machine Learning Theory, Physics; Grover's; and surely something is faster than Timsort:

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • awesome-theoretical-computer-science

    The interdicplinary of Mathematics and Computer Science, Distinguisehed by its emphasis on mathemtical technique and rigour.

  • fluxsort

    A fast branchless stable quicksort / mergesort hybrid that is highly adaptive.

  • Any chance you could comment on fluxsort[0], another fast quicksort? It's stable and uses a buffer about the size of the original array, which sounds like it puts it in a similar category as glidesort. Benchmarks against pdqsort at the end of that README; I can verify that it's faster on random data by 30% or so, and the stable partitioning should mean it's at least as adaptive (but the current implementation uses an initial analysis pass followed by adaptive mergesort rather than optimistic insertion sort to deal with nearly-sorted data, which IMO is fragile). There's an in-place effort called crumsort along similar lines, but it's not stable.

    I've been doing a lot of work on sorting[2], in particular working to hybridize various approaches better. Very much looking forward to seeing how glidesort works.

    [0] https://github.com/scandum/fluxsort

    [1] https://github.com/scandum/crumsort

    [2] https://mlochbaum.github.io/BQN/implementation/primitive/sor...

  • crumsort

    A branchless unstable quicksort / mergesort that is highly adaptive.

  • Any chance you could comment on fluxsort[0], another fast quicksort? It's stable and uses a buffer about the size of the original array, which sounds like it puts it in a similar category as glidesort. Benchmarks against pdqsort at the end of that README; I can verify that it's faster on random data by 30% or so, and the stable partitioning should mean it's at least as adaptive (but the current implementation uses an initial analysis pass followed by adaptive mergesort rather than optimistic insertion sort to deal with nearly-sorted data, which IMO is fragile). There's an in-place effort called crumsort along similar lines, but it's not stable.

    I've been doing a lot of work on sorting[2], in particular working to hybridize various approaches better. Very much looking forward to seeing how glidesort works.

    [0] https://github.com/scandum/fluxsort

    [1] https://github.com/scandum/crumsort

    [2] https://mlochbaum.github.io/BQN/implementation/primitive/sor...

  • SHOGUN

    Shōgun

  • The function is trying to get the median, which is not defined for an empty set. With this particular implementation, there is an assert for that:

    https://github.com/shogun-toolbox/shogun/blob/9b8d85/src/sho...

    Unrelatedly, but from the same section:

    > Fixes are trivial, access the nth element only after the call being made. Be careful.

    Wouldn't the proper fix to do the nth_element for the larget element first (for those cases that don't do that already) and then adjust the end to be the begin + larger_n for the second nth_element call? Otherwise the second call will check [begin + larger_n, end) again for no reason at all.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts