Minisketch: an optimized library for BCH-based set reconciliation (by sipa)

Minisketch Alternatives

Similar projects and alternatives to minisketch

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better minisketch alternative or higher similarity.

minisketch reviews and mentions

Posts with mentions or reviews of minisketch. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-11-20.
  • Peer-to-Peer Encrypted Messaging
    11 projects | | 20 Nov 2022
    Since the protocol appears to use adhoc synchronization, the authors might be interested in which is a library that implements a data structure (pinsketch) that allows two parties to synchronize their sets of m b-bit elements which differ by c entries using only b*c bits. A naive protocol would use m*b bits instead, which is potentially much larger.

    I'd guess that under normal usage the message densities probably don't justify such efficient means-- we developed this library for use in bitcoin targeting rates on the order of a dozen new messages per second and where every participant has many peers with potentially differing sets--, but it's still probably worth being aware of. The pinsketch is always equal or more efficient than a naive approach, but may not be worth the complexity.

    The somewhat better known IBLT data structure has constant overheads that make it less efficient than even naive synchronization until the set differences are fairly large (particular when the element hashes are small); so some applications that evaluated and eschewed IBLT might find pinsketch applicable.

  • Ask HN: What are some 'cool' but obscure data structures you know about?
    54 projects | | 21 Jul 2022
    Here is one not on the list so far:

    Set Sketches. They allow you compute the difference between two sets (for example to see if data has been replicated between two nodes) WITHOUT transmitting all the keys in one set to another.

    Say you have two sets of the numbers [1, ..., 1million] all 32 bit integers, and you know one set is missing 2 random numbers. Set sketches allow you to send a "set checksum" that is only 64 BITS which allows the other party to compute those missing numbers. A naive algorithm would require 1MB of data be transferred to calculate the same thing.

    *(in particular pin sketch

    54 projects | | 21 Jul 2022
    How about a pinsketch:

    A pinsketch is a set that takes a specified amount of memory and into which you can insert and remove set members or even add whole sets in time O(memory size). You can insert an unbounded number of entries, and at any time that it has equal or fewer entries than the size you can decode the list of members.

    For an example usage, say I have a list of ten million IP addresses of people who have DOS attacked my systems recently. I want to send my list to you over an expensive iridium connection, so I don't want to just send the 40MiB list. Fortunately you've been making your own observations (and maybe have stale data from me), and we don't expect our lists to differ by more than 1000 entries. So I make and maintain a pinsketch with size 1000 which takes 4000 bytes (1000 * 4bytes because IP addresses are 32-bits). Then to send you an update I just send it over. You maintain your own pinsketch of addresses, you subtract it from the one I sent and then you decode it. If the number of entries different between us is under 1000 you're guaranteed to learn the difference (otherwise the decode will fail, or give a false decode with odds ~= 1/2^(1000)).

    Bonus: We don't need to know in advance how different our sets are-- I can send the sketch in units as small as one word at a time (32-bits in this case) and stop sending once you've got enough to decode.

    Here is an implementation I contributed to:

    There is a application related data-structure called an inverted bloom lookup table (IBLT) that accomplishes the same task. Its encoding and especially decoding is much faster, and it has asymptotically the same communications efficiency. However, the constant factors on the communications efficiency are poor so for 'small' in set difference (like the 1000 above) it has a rather high overhead factor, and it can't guarantee decoding. I think that makes it much less magical, though it may be the right tool for some applications.

    IBLT also has the benefit that it the decoder is a fun bit of code golf to implement.

    54 projects | | 21 Jul 2022
    I love the set reconciliation structures like the IBLT (Iterative Bloom Lookup Table) and BCH set digests like minisketch.

    Lets say you have a set of a billion items. Someone else has mostly the same set but they differ by 10 items. These let you exchange messages that would fit in one UDP packet to reconcile the sets.

  • Here is how Ethereum COULD scale without increasing centralisation and without depending on layer two's.
    2 projects | | 27 Jan 2022
    Sipa is working on a better version of that for a while. The technical term is a "set reconciliation protocol", but Bitcoin Core been doing a more basic version of this for a while. Note that the "BCH" there isn't the same as Bcash
  • ish: Sketches for Zig
    3 projects | | 18 Dec 2021
    I'd also have to say that Zig is a pretty neat library for this. In order to implement PBS I needed the MiniSketch-library (written in C/C++) and I'll have to say that integrating with it has been a breeze. Some fiddling in build.zig so that I can avoid Makefile, and after that everything has worked amazingly.
  • The Pinecone Overlay Network
    2 projects | | 7 May 2021
    Networks that need to constrain themselves to limited typologies to avoid traffic magnification do so at the expense of robustness, especially against active attackers that grind their identifiers to gain privileged positions.

    Maybe this is a space where efficient reconciliation ( ) could help-- certainly if the goal were to flood messages to participants reconciliation can give almost optimal communication without compromising robustness.

  • A note from our sponsor - InfluxDB | 22 Mar 2023
    Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression. Learn more →


Basic minisketch repo stats
14 days ago
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives