The One Billion Row Challenge

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

1brc

28 5,077 9.9 Java

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

As far as I see the currently best performing solution [0] does not account for hash collisions and therefore probably generates wrong results if enough different cities are in the dataset. Or am I missing something?
[0] https://github.com/gunnarmorling/1brc/blob/main/src/main/jav...

1brc

5 60 7.5 C

C99 implementation of the 1 Billion Rows Challenge. 1️⃣🐝🏎️ Runs in ~1.6 seconds on my not-so-fast laptop CPU w/ 16GB RAM. (by dannyvankooten)

Very fun challenge that nerd sniped me right away. Had to do a C version in standard C99 with POSIX threads. It[1] clocks in at just under 4 seconds on my AMD Ryzen 4800U Laptop CPU.
Should run about 10-20% faster than that on the mentioned Hetzner hardware.
- Since we only do one decimal of floating point precision it uses integer math right from the get-go.
- FNV1-a hash with linear probing and a load factor well under 0.5.
- Data file is mmap’d into memory.
- Data is processed in 8 totally separate chunks (no concurrent data structures) and then those aggregations are in turn aggregated when all threads have finished.
1: https://github.com/dannyvankooten/1brc

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
1brc

1 4 4.1 C++

Yeah so I had a discussion on Twitter about this, turns out 12GB is small enough to fit into memory, and the author runs submissions by running a solution 5 times in a row, so using direct IO actually hurts because haing the kernel cache is a way to enforce the file is in memory for the 4 runs after. I have a direct IO solution with SIMD string search and double parsing, just in C++ (using libraries). It runs in 6 seconds on my 24 core linux box (NVMe).
Code: https://github.com/rockwotj/1brc
Discussion on Filesystem cache: https://x.com/rockwotj/status/1742168024776430041?s=20

1brc

3 416 9.0 C#

1BRC in .NET among fastest on Linux (by buybackoff)
JDK

191 18,393 10.0 Java

JDK main-line development https://openjdk.org/projects/jdk
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project