Otter, Fastest Go in-memory cache based on S3-FIFO algorithm

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • golang-fifo

    Modern efficient cache design with simple FIFO queue only in Golang

  • Hello, Thank you for replying here :)

    Many of answers you replied are reasonable and good.

    And I just want to add more comments for others.

    1. SIEVE is not scan-resistant, so that, I think it should only be applied for web cache workloads (typlically follows power-law distribution)

    2. SIEVE is somewhat scalalbe for read-intensive applications (e.g. blog, shop and etc), because it doesn't require to hold a lock on cahce hit.

    3. The purpose of golang-fifo is to provide simple and efficient cache implementation (e.g. hashicorp-lru, groupcache)

    4. when increasing contention otter sacrifices 1-2 percent

    -> I think that the statement is incorrect. The hit rate varies depending on the total number of objects and the size of the cache, so it should be compared relatively. for example, otter's efficiency decreased by 5% compared to single-threaded when lock contention increased (decreased efficiency makes a mean network latency higher, because it may need to conduct heavy operation e.g. re-calculation, database access and so on)

    5. ghost queue : honetly at that time of writing the code, I didn't deep dive into the bucket table implementation, it may not work same as actual bucket hash table (see here: https://github.com/scalalang2/golang-fifo/issues/16)

  • maphash

  • I was curious what function was used for hashing (how do you write a generic hash function in go?) and it's pretty disgusting.

    https://github.com/dolthub/maphash/blob/main/runtime.go

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • theine-go

    high performance in-memory cache

  • In fact, lock-free queues have several problems at once, which prompted me to give up on them almost immediately.

    1. Yes, S3-FIFO can be implemented using lock-free queues, but the problem is that each write to a filled cache using this design will cause a large number of additional atomic operations not friendly to the processor's cache, while bp-wrapper on the contrary amortizes this load. And reading with frequency update on hot entries can have a bad effect on performance. In many ways this is exactly what the last posts in my discussion with Ben are about (not really about this, but the current problem with otter read speed is caused by a similar problem). https://github.com/Yiling-J/theine-go/issues/29#issuecomment...

    2. But the main problem for me is not even that. Lock-free queues work fine as long as you only need to support Get and Set operations, but as soon as you want to add features to your cache, the complexity of the implementation starts to increase, and some features are very hard to add to such a structure. Also, improving the eviction policy is under a big question mark, because not only do you have to think about how to improve the eviction policy, but also how to avoid locks while doing so or how not to slow down the implementation with your improvements. BP-Wrapper has no such problems at all, allows you to use any eviction policy and focus on improving different parts of your cache independently of each other.

  • Caffeine

    A high performance caching library for Java

  • /u/someplaceguy,

    Those LIRS traces, along with many others, available at this page [1]. I did a cursory review using their traces using Caffeine's and the author's simulators to avoid bias or a mistaken implementation. In their target workloads Caffeine was on par or better [2]. I have not seen anything novel in this or their previous works and find their claims to be easily disproven, so I have not implement this policy in Caffeine simulator yet.

    [1]: https://github.com/ben-manes/caffeine/wiki/Simulator

    [2]: https://github.com/1a1a11a/libCacheSim/discussions/20

  • sosp23-s3fifo

    The repo for SOSP23 paper: FIFO queues are all you need for cache evictions

  • We observed that quick demotion[2] is important to achieve a low miss ratio in modern cache workloads, and existing algorithms such as TinyLFU and LIRS have lower miss ratios because of the small 1% window they use. This motivated us to design S3-FIFO, which uses simple FIFO queues to achieve low miss ratios. It is true that compared to state-of-the-art, S3-FIFO does not use any fancy techniques, but this does not mean it has bad performance.

    In our large-scale evaluations, we found that the fancy techniques in LIRS, ARC, and TinyLFU can sometimes increase the miss ratio. But simple FIFO queues are more robust. However, *it is not true that S3-FIFO is better on every trace*.

    * Note that some of the S3-FIFO results in Otter's repo are not updated and have an implementation bug, and we are working with the owner to update them.

    [1] https://github.com/Thesys-lab/sosp23-s3fifo?tab=readme-ov-fi...

  • otter

    A high performance lockless cache for Go. Many times faster than Ristretto and friends. (by maypok86)

  • xsync

    Concurrent data structures for Go

  • The issue is Go stdlib does not have parallel hash map.

    We have https://github.com/puzpuzpuz/xsync#map a different Cache line hashmap impl.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • libCacheSim

    a high performance library for building cache simulators

  • /u/someplaceguy,

    Those LIRS traces, along with many others, available at this page [1]. I did a cursory review using their traces using Caffeine's and the author's simulators to avoid bias or a mistaken implementation. In their target workloads Caffeine was on par or better [2]. I have not seen anything novel in this or their previous works and find their claims to be easily disproven, so I have not implement this policy in Caffeine simulator yet.

    [1]: https://github.com/ben-manes/caffeine/wiki/Simulator

    [2]: https://github.com/1a1a11a/libCacheSim/discussions/20

  • go-cache-benchmark

    Cache benchmark for web cache workloads in golang. (by scalalang2)

  • - I added otter to my cache benchmark. It shows less efficiency than mine. I'm not sure why this happens.

    See results here: https://github.com/scalalang2/go-cache-benchmark

    [3]

  • ristretto

    A high performance memory-bound Go cache

  • 1. Unfortunately, ristretto has been showing hit ratio around 0 on almost all traces for a very long time now and the authors don't respond to this in any way. Vitess for example has already changed it to another cache. Here are two issues about it: https://github.com/dgraph-io/ristretto/issues/346 and https://github.com/dgraph-io/ristretto/issues/336. That is, ristretto shows such results even on its own benchmarks. You can see it just by running hit ratio benchmarks on a very simple zipf distribution from the ristretto repository: https://github.com/dgraph-io/ristretto/blob/main/stress_test.... On this test I got the following:

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts