5000x Faster CRDTs: An Adventure in Optimization

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • crdt-benchmarks

    A collection of CRDT benchmarks

    Cool! It'd be interesting to see those CRDT implementations added to Kevin Jahns' CRDT Benchmarks page[1]. The LogootSplit paper looks interesting. It looks like xray is abandoned, and I'm not sure about teletype. Though teletype's CRDT looks to be entirely implemented in javascript[2]? If the authors are around I'd love to see some benchmarks so we can compare approaches and learn what actually works well.

    And I'm not surprised these techniques have been invented before. Realising a tree is an appropriate data structure here is a pretty obvious step if you have a mind for data structures.

    To name it, I often find myself feeling defensive when people read my work and respond with a bunch of links to academic papers. Its probably totally unfair and a complete projection from my side, but I hear a voice in my head reword your comment to instead say something awful like: "Cool, but everything you did was done before. Even if they didn't make any of their work practical, usable or good they still published first and you obviously didn't do a good enough literature review if you didn't know that." And I feel an unfair defensiveness arise in me as a result that wants to find excuses to dismiss the work, even if the work might be otherwise interesting.

    Its hard to compare their benchmark results because they used synthetic randomized editing traces, which always have different performance profiles than real edits for this stuff. Their own university gathered some great real world data in an earlier study. It would have been much more instructive if that data set was used here. At a glance their RAM usage looks to be about 2 orders of magnitude worse than diamond-types or yjs. And their CPU usage... ?? I can't tell because they have no tables of results. Just some hard to read charts with log scales, so you can't even really eyeball the figures. So its really hard to tell if their work ends up performance-competitive without spending a couple days getting their enterprise style java code running with a better data set. Do you think thats worth doing?

    [1] https://github.com/dmonad/crdt-benchmarks

    [2] https://github.com/atom/teletype-crdt

  • citrea-model

    A CRDT-based collaborative editor engine of letters.yandex.ru (2012, historical)

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

  • diamond-types

    The world's fastest CRDT. WIP.

  • teletype-crdt

    Discontinued String-wise sequence CRDT powering peer-to-peer collaborative editing in Teletype for Atom.

    Cool! It'd be interesting to see those CRDT implementations added to Kevin Jahns' CRDT Benchmarks page[1]. The LogootSplit paper looks interesting. It looks like xray is abandoned, and I'm not sure about teletype. Though teletype's CRDT looks to be entirely implemented in javascript[2]? If the authors are around I'd love to see some benchmarks so we can compare approaches and learn what actually works well.

    And I'm not surprised these techniques have been invented before. Realising a tree is an appropriate data structure here is a pretty obvious step if you have a mind for data structures.

    To name it, I often find myself feeling defensive when people read my work and respond with a bunch of links to academic papers. Its probably totally unfair and a complete projection from my side, but I hear a voice in my head reword your comment to instead say something awful like: "Cool, but everything you did was done before. Even if they didn't make any of their work practical, usable or good they still published first and you obviously didn't do a good enough literature review if you didn't know that." And I feel an unfair defensiveness arise in me as a result that wants to find excuses to dismiss the work, even if the work might be otherwise interesting.

    Its hard to compare their benchmark results because they used synthetic randomized editing traces, which always have different performance profiles than real edits for this stuff. Their own university gathered some great real world data in an earlier study. It would have been much more instructive if that data set was used here. At a glance their RAM usage looks to be about 2 orders of magnitude worse than diamond-types or yjs. And their CPU usage... ?? I can't tell because they have no tables of results. Just some hard to read charts with log scales, so you can't even really eyeball the figures. So its really hard to tell if their work ends up performance-competitive without spending a couple days getting their enterprise style java code running with a better data set. Do you think thats worth doing?

    [1] https://github.com/dmonad/crdt-benchmarks

    [2] https://github.com/atom/teletype-crdt

  • automerge

    A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.

    Wait it doesn't look like you used the performance branch of automerge (which is now merged into master). It is significantly faster.

    https://github.com/automerge/automerge/pull/253

  • dotted-logootsplit

    A delta-state block-wise sequence CRDT

    Yes, xray was abandoned and teletype is written in JS.

    I understand your point and as a researcher and engineer I know your feeling. I took some cautions by using "Some optimizations". I value engineering as much as research and I'm bothered when I heard any side telling the other side that their work is worthless. Your work and the work of Kevin Jahns are very valuable and could improve the way that researchers and engineers do benchmarks.

    This is still hard for me to determine when position-based list CRDT (Logoot, LogootSPlit, ...) are better than tombstone-based list CRDT (RGA, RgaSplit, Yata, ...). It could be worth to assess that.

    3 year ago I started an update of LogootSplit. The new CRDT is named Dotted LogootSplit [1] and enables delta-synchronizations. The work is not finished: I had other priorities such as writing my thesis... I have to perform some benchmark. However I'm more interested in the hypothetical advantages of Dotted LogootSplit regarding synchronization over unreliable networks. From an engineering point-of-view, I'm using a partially-persistent-capable AVL tree [2]. Eventually I would like to switch to a partially-persistent-capable b-tree. Unfortunately writing a paper is very time consuming, and time is missing.

    I still stick with JS/TS because in my viewpoint Wasm is not mature yet. Ideally, I would like to use a language that compiles both to JS and Wasm. Several years ago I welcomed Rust with a lot of enthusiasm. Now I'm doubtful about Rust due to the inherent complexity of the language.

    [1] https://github.com/coast-team/dotted-logootsplit/tree/dev

  • cow-list

    Copy-On-Write iterable list

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts