I/O is no longer the bottleneck

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • share-file-systems

    Use a Windows/OSX like GUI in the browser to share files cross OS privately. No cloud, no server, no third party.

  • I encountered this myself yesterday when attempting to performance test WebSockets in JavaScript: https://github.com/prettydiff/share-file-systems/blob/master...

    The parsing challenge is complex enough that it will always be faster to extra the data from the network than it is to process it. As a result excess data must be stored until it can be evaluated or else it must be dropped, therefore the primary processing limitation is memory access not CPU speed executing instructions. JavaScript is a garbage collected language, so you are at the mercy of the language and it doesn't really matter how you write the code because if the message input frequency is high enough and large enough memory will always be the bottleneck, not the network or the application code.

    In terms of numbers this is provable. When testing WebSocket performance on my old desktop with DDR3 memory I was sending messages (without a queue or any kind of safety consideration) at about 180,000 messages per second. In my laptop with DDR4 memory the same test indicated a message send speed at about 420,000 messages per second. The CPU in the old desktop is faster and more powerful than the CPU in the laptop.

  • napkin-math

    Techniques and numbers for estimating system's performance from first-principles

  • Yes, sequential I/O bandwidth is closing the gap to memory. [1] The I/O pattern to watch out for, and the biggest reason why e.g. databases do careful caching to memory, is that _random_ I/O is still dreadfully slow. I/O bandwidth is brilliant, but latency is still disappointing compared to memory.

    [1]: https://github.com/sirupsen/napkin-math

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • simdjson

    Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

  • NVME storage really is very fast for sequential reads, but I'd respectfully suggest that for simple tasks a Dell laptop with 1.6GB/s read speed should be bottlenecked by IO if the compute is optimised. For example SIMD-json can parse json at over 7GB/s. https://simdjson.org/

  • fast-sqlite3-inserts

    Some bunch of test scripts to generate a SQLite DB with 1B rows in fastest possible way

  • I am working on a project [0] to generate 1 billion rows in SQLite under a minute and inserted 100M rows inserts in 33 seconds. First, I generate the rows and insert them in an in-memory database, then flush them to the disk at the end. To flush it to disk it takes only 2 seconds, so 99% of the time is being spent generating and adding rows to the in-memory B Tree.

    For Python optimisation, have you tried PyPy? I ran my same code (zero changes) using PyPy, and I got 3.5x better speed.

    I published my findings here [1].

    [0] - https://github.com/avinassh/fast-sqlite3-inserts

    [1] - https://avi.im/blag/2021/fast-sqlite-inserts/

  • adix

    An Adaptive Index Library for Nim

  • Note: Just concatenating the bibles keeps your hash map artificially small...which matters because as you correctly note the big deal is if you can fit the histogram in the L2 cache as noted elsewhere and this really matters if you go parallel where N CPUsL2 caches can speed things up a lot -- until* your histograms blow out CPU-private L2 cache sizes. https://github.com/c-blake/adix/blob/master/tests/wf.nim (or a port to your favorite lang) might make it easy to play with these ideas.

  • countwords

    Playing with counting word frequencies (and performance) in various languages. (by kimono-koans)

  • this is truly 1978 all over again. No flame graphs, no hardware counters no bottleneck analysis. Using these 'optimizations' for job interviews is questionable at best.

    [1] https://benhoyt.com/writings/count-words/

  • RAMCloud

    **No Longer Maintained** Official RAMCloud repo

  • On a related note, John Ousterhout (in the RAMCloud project) was basically betting that the latency of accessing RAM on another computer on a fast local network will eventually become competitive to local RAM access.

    https://ramcloud.atlassian.net/wiki/spaces/RAM/overview

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Killed by Google

    Part guillotine, part graveyard for Google's doomed apps, services, and hardware.

  • Yeah, when I was there I saw plenty of <1% optimizations saving REDACTED gobs of money, and people were rewarded for it. I don't think it's applicable to most teams though.

    Imagine a foo/bar/widget app that only serves 20B people (obvious exaggeration to illustrate the point) and is only necessary up to a few hundred times per day. You can handle that sort of traffic on a laptop on my home router and still have enough hootzpah left to stream netflix. I mean, you are Google, and you need to do something better than that [0], but the hardware for your project is going to be negligible compared to other concerns unless you're doing FHE or video transcoding or something extraordinarily expensive.

    Walk that backward to, how many teams have 20B users or are doing extraordinarily expensive things? I don't have any clue, but when you look at public examples of cheap things that never got much traction and probably had a suite of engineers [1], I'd imagine it's not everyone in any case. You're probably mostly looking at people with enough seniority to be able to choose to work on core code affecting most services.

    [0] https://www.youtube.com/watch?v=3t6L-FlfeaI

    [1] https://killedbygoogle.com/

  • huniq

    Filter out duplicates on the command line. Replacement for `sort | uniq` optimized for speed (10x faster) when sorting is not needed.

  • `sort | uniq` is really slow for this, as it has to sort the entire input first. I use `huniq` which is way faster for this. I'm sure there are many similar options.

    https://github.com/koraa/huniq

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts