Ask HN: How can I learn about performance optimization?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • amh-code

    Complete implementations from "Algorithms for Modern Hardware"

  • As you can tell from the diversity of responses here it really depends on what you're doing. In my work I use C++, and "optimization" typically involves making a heavy computation run faster (measured in wall clock time) or making a particular subsystem use less memory.

    The number one most important thing you can do is dive in and start profiling real-world code. Find a part of your software that is too slow or uses too many resources, and use whatever the standard profiler is for your development environment to figure out why. Performance optimization is a very empirical discipline. Yes there are general principles, but if you don't measure your baseline or your changes you won't know how good your optimization was. In my experience, the first attempt at a fix is often flat-out wrong! Doing this first will also help motivate your reading.

    Once you know how to measure the performance of your software, I recommend learning the basics of modern computer architecture. At a minimum, learn about CPU caches, how they work, and how to design your code to use them effectively. I find Algorithms for Modern Hardware to be a good resource for this [1], but there are many others. Relatedly, you should have a rough idea of how long it takes for your computer to do various basic things (fetch something from memory, fetch something from cache, etc.). There's a table at [2] that gives a good idea. Don't worry too much about the absolute values--the order of magnitude is what's important.

    You should also study fundamental data structures, but understand that for low-level programming 95% of the time the correct answer will be to shove everything into a simple flat array (e.g. std::vector in C++), maybe with some sort of index on top. Fancy data structures are more important in higher-level languages that are structurally unable to make effective use of modern hardware.

    [1] https://en.algorithmica.org/hpc/

  • perf-book

    The book "Performance Analysis and Tuning on Modern CPU" (by dendibakh)

  • Denis Bakhvalov has some great resources for this:

    1. His free course: https://products.easyperf.net/perf-ninja

    2. His free book: https://book.easyperf.net/perf_book (the 2nd edition is being worked on right now and there's a draft on github: https://github.com/dendibakh/perf-book)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • wisdom

    Building better developers by specifying criteria of success (by prettydiff)

  • Measure everything and be extremely critical. Be ready to challenge common and popular held assumptions.

    Here is something I wrote about extreme performance in JavaScript that is discarded by most programmers because most people that program JavaScript professionally cannot really program.

    https://github.com/prettydiff/wisdom/blob/master/performance...

  • 1brc

    1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

  • If you are in “javaland” look at billion row challenge, you will learn a lot - https://github.com/gunnarmorling/1brc

  • compiler-explorer

    Run compilers interactively from your web browser and interact with the assembly

  • [P&H RISC] https://www.google.com/books/edition/_/e8DvDwAAQBAJ

    Compiler Explorer by Matt Godbolt [Godbolt] can help better understand what code a compiler generates under different circumstances.

    [Godbolt] https://godbolt.org

    The official CPU architecture manuals from CPU vendors are surprisingly readable and information-rich. I only read the fragments that I need or that I am interested in and move on. Here is the Intel’s one [Intel]. I use the Combined Volume Set, which is a huge PDF comprising all the ten volumes. It is easier to search in when it’s all in one file. I can open several copies on different pages to make navigation easier.

    Intel also has a whole optimization reference manual [Intel] (scroll down, it’s all on the same page). The manual helps understand what exactly the CPU is doing.

    [Intel] https://www.intel.com/content/www/us/en/developer/articles/t...

    Personally, I believe in automated benchmarks that measure end-to-end what is actually important and notify you when a change impacts performance for the worse.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts