5000x Faster CRDTs: An Adventure in Optimization

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • SonarLint - Deliver Cleaner and Safer Code - Right in Your IDE of Choice!
  • OPS - Build and Run Open Source Unikernels
  • Scout APM - Less time debugging, more time building
  • GitHub repo crdt-benchmarks

    A collection of CRDT benchmarks

    Cool! It'd be interesting to see those CRDT implementations added to Kevin Jahns' CRDT Benchmarks page[1]. The LogootSplit paper looks interesting. It looks like xray is abandoned, and I'm not sure about teletype. Though teletype's CRDT looks to be entirely implemented in javascript[2]? If the authors are around I'd love to see some benchmarks so we can compare approaches and learn what actually works well.

    And I'm not surprised these techniques have been invented before. Realising a tree is an appropriate data structure here is a pretty obvious step if you have a mind for data structures.

    To name it, I often find myself feeling defensive when people read my work and respond with a bunch of links to academic papers. Its probably totally unfair and a complete projection from my side, but I hear a voice in my head reword your comment to instead say something awful like: "Cool, but everything you did was done before. Even if they didn't make any of their work practical, usable or good they still published first and you obviously didn't do a good enough literature review if you didn't know that." And I feel an unfair defensiveness arise in me as a result that wants to find excuses to dismiss the work, even if the work might be otherwise interesting.

    Its hard to compare their benchmark results because they used synthetic randomized editing traces, which always have different performance profiles than real edits for this stuff. Their own university gathered some great real world data in an earlier study. It would have been much more instructive if that data set was used here. At a glance their RAM usage looks to be about 2 orders of magnitude worse than diamond-types or yjs. And their CPU usage... ?? I can't tell because they have no tables of results. Just some hard to read charts with log scales, so you can't even really eyeball the figures. So its really hard to tell if their work ends up performance-competitive without spending a couple days getting their enterprise style java code running with a better data set. Do you think thats worth doing?

    [1] https://github.com/dmonad/crdt-benchmarks

    [2] https://github.com/atom/teletype-crdt

  • GitHub repo citrea-model

    A CRDT-based collaborative editor engine of letters.yandex.ru (2012, historical)

  • SonarLint

    Deliver Cleaner and Safer Code - Right in Your IDE of Choice!. SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.

  • GitHub repo diamond-types

    The world's fastest CRDT. WIP.

  • GitHub repo teletype-crdt

    String-wise sequence CRDT powering peer-to-peer collaborative editing in Teletype for Atom.

    Cool! It'd be interesting to see those CRDT implementations added to Kevin Jahns' CRDT Benchmarks page[1]. The LogootSplit paper looks interesting. It looks like xray is abandoned, and I'm not sure about teletype. Though teletype's CRDT looks to be entirely implemented in javascript[2]? If the authors are around I'd love to see some benchmarks so we can compare approaches and learn what actually works well.

    And I'm not surprised these techniques have been invented before. Realising a tree is an appropriate data structure here is a pretty obvious step if you have a mind for data structures.

    To name it, I often find myself feeling defensive when people read my work and respond with a bunch of links to academic papers. Its probably totally unfair and a complete projection from my side, but I hear a voice in my head reword your comment to instead say something awful like: "Cool, but everything you did was done before. Even if they didn't make any of their work practical, usable or good they still published first and you obviously didn't do a good enough literature review if you didn't know that." And I feel an unfair defensiveness arise in me as a result that wants to find excuses to dismiss the work, even if the work might be otherwise interesting.

    Its hard to compare their benchmark results because they used synthetic randomized editing traces, which always have different performance profiles than real edits for this stuff. Their own university gathered some great real world data in an earlier study. It would have been much more instructive if that data set was used here. At a glance their RAM usage looks to be about 2 orders of magnitude worse than diamond-types or yjs. And their CPU usage... ?? I can't tell because they have no tables of results. Just some hard to read charts with log scales, so you can't even really eyeball the figures. So its really hard to tell if their work ends up performance-competitive without spending a couple days getting their enterprise style java code running with a better data set. Do you think thats worth doing?

    [1] https://github.com/dmonad/crdt-benchmarks

    [2] https://github.com/atom/teletype-crdt

  • GitHub repo automerge

    A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.

    Wait it doesn't look like you used the performance branch of automerge (which is now merged into master). It is significantly faster.

    https://github.com/automerge/automerge/pull/253

  • GitHub repo dotted-logootsplit

    A delta-state block-wise sequence CRDT

    Yes, xray was abandoned and teletype is written in JS.

    I understand your point and as a researcher and engineer I know your feeling. I took some cautions by using "Some optimizations". I value engineering as much as research and I'm bothered when I heard any side telling the other side that their work is worthless. Your work and the work of Kevin Jahns are very valuable and could improve the way that researchers and engineers do benchmarks.

    This is still hard for me to determine when position-based list CRDT (Logoot, LogootSPlit, ...) are better than tombstone-based list CRDT (RGA, RgaSplit, Yata, ...). It could be worth to assess that.

    3 year ago I started an update of LogootSplit. The new CRDT is named Dotted LogootSplit [1] and enables delta-synchronizations. The work is not finished: I had other priorities such as writing my thesis... I have to perform some benchmark. However I'm more interested in the hypothetical advantages of Dotted LogootSplit regarding synchronization over unreliable networks. From an engineering point-of-view, I'm using a partially-persistent-capable AVL tree [2]. Eventually I would like to switch to a partially-persistent-capable b-tree. Unfortunately writing a paper is very time consuming, and time is missing.

    I still stick with JS/TS because in my viewpoint Wasm is not mature yet. Ideally, I would like to use a language that compiles both to JS and Wasm. Several years ago I welcomed Rust with a lot of enthusiasm. Now I'm doubtful about Rust due to the inherent complexity of the language.

    [1] https://github.com/coast-team/dotted-logootsplit/tree/dev

  • GitHub repo cow-list

    Copy-On-Write iterable list

  • OPS

    OPS - Build and Run Open Source Unikernels. Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts