The One Billion Row Challenge

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB high-performance time series database
Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
influxdata.com
featured
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
  1. 1brc

    1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

    As far as I see the currently best performing solution [0] does not account for hash collisions and therefore probably generates wrong results if enough different cities are in the dataset. Or am I missing something?

    [0] https://github.com/gunnarmorling/1brc/blob/main/src/main/jav...

  2. InfluxDB

    InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.

    InfluxDB logo
  3. 1brc

    C11 implementation of the 1 Billion Rows Challenge. 1️⃣🐝🏎️ Runs in ~1.6 seconds on my not-so-fast laptop CPU w/ 16GB RAM. (by dannyvankooten)

    Very fun challenge that nerd sniped me right away. Had to do a C version in standard C99 with POSIX threads. It[1] clocks in at just under 4 seconds on my AMD Ryzen 4800U Laptop CPU.

    Should run about 10-20% faster than that on the mentioned Hetzner hardware.

    - Since we only do one decimal of floating point precision it uses integer math right from the get-go.

    - FNV1-a hash with linear probing and a load factor well under 0.5.

    - Data file is mmap’d into memory.

    - Data is processed in 8 totally separate chunks (no concurrent data structures) and then those aggregations are in turn aggregated when all threads have finished.

    1: https://github.com/dannyvankooten/1brc

  4. 1brc

    Yeah so I had a discussion on Twitter about this, turns out 12GB is small enough to fit into memory, and the author runs submissions by running a solution 5 times in a row, so using direct IO actually hurts because haing the kernel cache is a way to enforce the file is in memory for the 4 runs after. I have a direct IO solution with SIMD string search and double parsing, just in C++ (using libraries). It runs in 6 seconds on my 24 core linux box (NVMe).

    Code: https://github.com/rockwotj/1brc

    Discussion on Filesystem cache: https://x.com/rockwotj/status/1742168024776430041?s=20

  5. 1brc

    1BRC in .NET among fastest on Linux (by buybackoff)

  6. JDK

    JDK main-line development https://openjdk.org/projects/jdk

  7. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • To `Gather` or not to `Gather`? That is the question.

    2 projects | dev.to | 24 Mar 2025
  • OpenJDK: x86_64 AVX512 intrinsics for Arrays.sort methods (2023)

    1 project | news.ycombinator.com | 25 Dec 2024
  • Ideas from "A Philosophy of Software Design"

    2 projects | news.ycombinator.com | 22 Dec 2024
  • Compact Object Headers in Java 24

    1 project | news.ycombinator.com | 9 Nov 2024
  • Hash Ordering and Hyrum's Law

    4 projects | news.ycombinator.com | 2 Oct 2024

Did you know that Java is
the 8th most popular programming language
based on number of references?