-
1brc
1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
-
nodejs
1️⃣🐝🏎️ The One Billion Row Challenge with Node.js -- A fun exploration of how quickly 1B rows from a text file can be aggregated with different languages.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
1BillionRowChallenge
I saw this [Blog Post](https://www.morling.dev/blog/one-billion-row-challenge/) on a Billion Row challenge for Java so naturally I tried implementing a solution in Python & Rust using mainly polars
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
bitcoin_ancestries
This codebase will produce some stats on the ancestry of each transaction of the Bitcoin network.
-
1brc
C99 implementation of the 1 Billion Rows Challenge. 1️⃣🐝🏎️ Runs in ~1.6 seconds on my not-so-fast laptop CPU w/ 16GB RAM. (by dannyvankooten)
There are a few rust solutions in the "Show and Tell" linked above, for example this fairly readable one at 15.5s: https://github.com/gunnarmorling/1brc/discussions/57
A comment above referencing Python "polars" actually has rust polars, std, and SIMD solutions as well (SIMD was fasted, but less readable for a hobbyist like me).
I did it with custom parsing[0] and treated the numbers as 16 bit integers, the representation in the file is not a constant number of bytes which complicates the table approach. If you end up computing a hash I think it might be slower than just doing the equivalent parsing I do and a four byte constant table will be very large and mostly empty. Maybe a a trie would be good.
0: https://github.com/k0nserv/brc/blob/main/src/main.rs#L279
I was curious how long it would take with Polars (for scale), apparently 33s: https://github.com/Butch78/1BillionRowChallenge/tree/main
The more accurate statement would be is Go incapable of optimizations performed by Java and then Java is incapable of optimizations performed by C# and C++ implementations.
See https://hotforknowledge.com/2024/01/13/1brc-in-dotnet-among-...
Well, I guess it's more that the standard library doesn't have a cross-platform way to access them, not that memory-mapped files themselves can't be done on (say) Windows. It looks like there's a fairly popular 3rd party package that supports at least Linux, macOS, and Windows: https://github.com/edsrzf/mmap-go
https://github.com/attractivechaos/plb2/blob/master/README.m...
Synthetic benchmarks aside, I think as far as average (spring boots of the world) code goes, Go beats Java almost every time, often in less lines than the usual pom.xml
I thought this was an illustrative example of how to process big datasets. We could easily have a statistic per e.g. bitcoin address in a different problem, see https://github.com/afiodorov/bitcoin_ancestries .
I struggle a lot with this toy problem. Without constraints too trivial to pay attention to; then no one seems to agree on potential real-world constraints.
c dominates every other language again...https://github.com/dannyvankooten/1brc#submitting