-
1brc
1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
We have already created the file measurements.txt with 1 million lines using the semi-official tool create_measurements.py:
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
# Install build tools and libraries needed sudo apt-get -y install build-essential autoconf libtool bison re2c pkg-config git libxml2-dev libssl-dev # Clone and build a stripped-down version of PHP with ZTS support git clone https://github.com/php/php-src.git --branch=PHP-8.4.3 --depth=1 cd php-src/ ./buildconf ./configure --prefix=/opt/php8.4-zts --with-config-file-path=/opt/php8.4-zts/etc/php --disable-all --disable-ipv6 --disable-cgi --disable-phpdbg --enable-zts --enable-xml --with-libxml --with-pear --with-openssl make -j32 ./sapi/cli/php -v sudo make install # Install `parallel` module from PECL sudo /opt/php8.4-zts/bin/pecl channel-update pecl.php.net sudo /opt/php8.4-zts/bin/pecl install parallel sudo mkdir -p /opt/php8.4-zts/etc/php/conf.d echo 'extension=parallel.so' | sudo tee -a /opt/php8.4-zts/etc/php/php.ini echo 'memory_limit=-1' | sudo tee -a /opt/php8.4-zts/etc/php/php.ini # Verify module installation /opt/php8.4-zts/bin/php -i | grep parallel
-
For the second point (rewriting the PHP script for multithreading), you can take inspiration from the PHP documentation and solutions to the 1BRC challenge available on the Internet. The one I took heavily from is this one. The overhead from multithread management mostly comes from having to cycle the measurements.txt file first to split it into chunks that match the number of cores on the machine where the script is running. Each thread will process one of these chunks that, combined, will lead to the final result.
-
The full code is available on Github: https://github.com/gfabrizi/1brc-php