-
1brc
1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
As far as I see the currently best performing solution [0] does not account for hash collisions and therefore probably generates wrong results if enough different cities are in the dataset. Or am I missing something?
[0] https://github.com/gunnarmorling/1brc/blob/main/src/main/jav...
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
-
1brc
C11 implementation of the 1 Billion Rows Challenge. 1️⃣🐝🏎️ Runs in ~1.6 seconds on my not-so-fast laptop CPU w/ 16GB RAM. (by dannyvankooten)
Very fun challenge that nerd sniped me right away. Had to do a C version in standard C99 with POSIX threads. It[1] clocks in at just under 4 seconds on my AMD Ryzen 4800U Laptop CPU.
Should run about 10-20% faster than that on the mentioned Hetzner hardware.
- Since we only do one decimal of floating point precision it uses integer math right from the get-go.
- FNV1-a hash with linear probing and a load factor well under 0.5.
- Data file is mmap’d into memory.
- Data is processed in 8 totally separate chunks (no concurrent data structures) and then those aggregations are in turn aggregated when all threads have finished.
1: https://github.com/dannyvankooten/1brc
-
Yeah so I had a discussion on Twitter about this, turns out 12GB is small enough to fit into memory, and the author runs submissions by running a solution 5 times in a row, so using direct IO actually hurts because haing the kernel cache is a way to enforce the file is in memory for the 4 runs after. I have a direct IO solution with SIMD string search and double parsing, just in C++ (using libraries). It runs in 6 seconds on my 24 core linux box (NVMe).
Code: https://github.com/rockwotj/1brc
Discussion on Filesystem cache: https://x.com/rockwotj/status/1742168024776430041?s=20
-
-
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.