FastPFor
yjs
FastPFor | yjs | |
---|---|---|
4 | 53 | |
839 | 15,350 | |
- | 3.9% | |
8.5 | 8.6 | |
about 1 month ago | 6 days ago | |
C++ | JavaScript | |
Apache License 2.0 | MIT |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
FastPFor
-
Making CRDTs 98% More Efficient
Well if your integers are sequential you can encode huge numbers of them using diff + RLE in just a few bytes, likely far fewer than 1/2 a byte on average, for the right dataset (in theory you can store 1,2,3,4,5...10_000 in 2 bytes).
But for other integer datasets there's FastPFOR
https://github.com/lemire/FastPFor
The linked papers there will talk about techniques that can be used to store multiple 32bit integers into a single byte, etc. Integer compression is pretty powerful if your data isn't random. The thing with UUIDs is that your data is pretty random - even a UUIDv7 contains a significant amount of random data.
-
Time-Series Compression Algorithms
One notable omission from this piece is a technique to compress integer time series with both positive and negative values.
If you naively apply bit-packing using the Simple8b algorithm, you'll find that negative integers are not compressed. This is due to how signed integers are represented in modern computers: negative integers will have their most significant bit set [1].
Zigzag encoding is a neat transform that circumvents this issue. It works by mapping signed integers to unsigned integers so that numbers with a small absolute value can be encoded using a small number of bits. Put another way, it encodes negative numbers using the least significant bit for sign. [2]
If you're looking for a quick way to experiment with various time series compression algorithm I highly recommend Daniel Lemire's FastPFor repository [3] (as linked in the article). I've used the Python bindings [4] to quickly evaluate various compression algorithms with great success.
Finally I'd like to humbly mention my own tiny contribution [5], an adaptation of Lemire's C++ Simple8b implementation (including basic methods for delta & zigzag encoding/decoding).
I used C++ templates to make the encoding and decoding routines generic over integer bit-width, which expands support up to 64 bit integers, and offers efficient usage with smaller integers (eg 16 bit). I made a couple other minor tweaks including support for arrays up to 2^64 in length, and tweaking the API/method signatures so they can be used in a more functional style. This implementation is slightly simpler to invoke via FFI, and I intend to add examples showing how to compile for usage via JS (WebAssembly), Python, and C#. I threw my code up quickly in order to share with you all, hopefully someone finds it useful. I intend to expand on usage examples/test cases/etc, and am looking forward to any comments or contributions.
[1] https://en.wikipedia.org/wiki/Signed_number_representation
[2] https://en.wikipedia.org/wiki/Variable-length_quantity#Zigza...
[3] https://github.com/lemire/FastPFor
[4] https://github.com/searchivarius/PyFastPFor
[5] https://github.com/naturalplasmoid/simple8b-timeseries-compr...
- The big-load anti-pattern
- FastPFOR: Fast Integer Compression in C++
yjs
- Show HN: Collaborate on your YC Application with CRDT-powered forms
-
Making CRDTs 98% More Efficient
One idea is just to use fewer random bits in peerIDs. Yjs (https://docs.yjs.dev/) gets away with just 32 random bits. If you compromise and use 64 random bits, then even a very popular doc with 1 million lifetime peerIDs will have a < 10^-7 lifetime probability of collision.
-
An Interactive Intro to CRDTs
I've seen it come up often in collaborative text editors.
Also see: https://github.com/yjs/yjs
-
JSON Schema Store
You are absolutely right that XML is better for document structures.
My current theory is that Yjs [0] is the new JSON+XML. It gives you both JSON and XML types in one nested structure, all with conflict free merging via incremental updates.
Also, you note the issue with XML and overlapping inline markup. Yjs has an answer for that with its text type, you can apply attributes (for styling or anything else) via arbatary ranges. They can overlap.
Obviously I'm being a little hypabolic suggesting it will replace JSON, the beauty of JSON is is simplicity, but for many systems building on Yjs or similar CRDT based serialisation systems is the future.
https://github.com/yjs/yjs/
-
Launch HN: Tiptap (YC S23) – Toolkit for developing collaborative editors
Note: https://github.com/yjs/yjs for collaborative "document edition, and user cursors"; has WebRTC, web socket, matrix.org backend
-
Wormholers, what can CCP and wormholers do to improve J-Space?
CCP needs to revamp proto anyway, due to recent exploits... practically, nothing really prevents 'em from using some sort of CRDT's to make the state of the sig view eventually consistent (yjs lib, if we're speaking frontendian).
-
How to use Yjs with Ruby on Rails?
Yjs framework: Because it is a CRDT implementation which provides collaborative editing and offline-first capability.
-
🐑🐑🐑 EweserDB, the user-owned database 🐑🐑🐑
No problem. The database CRUD features are just helpers as an abstraction on top of yjs: https://docs.yjs.dev/. Eweser adds schemas in the form of typescript types to make using it simpler, more structured, and interoperability easier.
- Ask HN: What is new in Algorithms / Data Structures these days?
- How does Google docs send the changes done by other users in real-time?
What are some alternatives?
Code-used-on-Daniel-Lemire-s-blog - This is a repository for the code posted on my blog
automerge - A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.
simple8b-timeseries-compression
liveblocks - Liveblocks is a platform to ship collaborative features like comments, notifications, text editors in minutes instead of months.
simple8b-timeseries-compr
automerge-rs - Rust implementation of automerge [Moved to: https://github.com/automerge/automerge]
interpolative_coding - A flexible and efficient C++ implementation of the Binary Interpolative Coding algorithm.
crdt-woot - Implementation of collaborative editing algorithm CRDT WOOT.
milkdown - 🍼 Plugin driven WYSIWYG markdown editor framework.
MobX - Simple, scalable state management.
pacman-backup - :floppy_disk: Pacman Backup tool for off-the-grid updates via portable USB sticks or (mesh) LAN networks.
logseq - A local-first, non-linear, outliner notebook for organizing and sharing your personal knowledge base. Use it to organize your todo list, to write your journals, or to record your unique life.