Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression. Learn more →
Similar projects and alternatives to minisketch
End-to-end encrypted file transfer for Android and iOS. A Magic Wormhole Mobile client.
Continuous Profiling Platform. Debug performance issues down to a single line of code [Moved to: https://github.com/grafana/pyroscope]
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
A high performance caching library for Java
The Clojure programming language
Gaming meets modern C++ - a fast and reliable entity component system (ECS) and much more
Java implementation of a concurrent trie
A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means
Write Clean C++ Code. Always.. Sonar helps you commit clean C++ code every time. With over 550 unique rules to find C++ bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
Sketches for Zig (by judofyr)
Succinct Data Structure Library 2.0
Berty is a secure peer-to-peer messaging app that works with or without internet access, cellular data or trust in the network
The Python programming language
An open-source C++ library developed and used at Facebook.
This repository has examples of broken patterns in ASP.NET Core applications
Benchmarks of approximate nearest neighbor libraries in Python
A better compressed bitset in Java
SimpleX - the first messaging platform operating without user identifiers of any kind - 100% private by design! iOS and Android apps are released 📱!
Tinfoil Chat - Onion-routed, endpoint secure messaging system
A graph store for Clojure and ClojureScript
FusionCache is an easy to use, high performance and robust cache with an optional distributed 2nd layer and some advanced features.
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
minisketch reviews and mentions
Peer-to-Peer Encrypted Messaging
11 projects | news.ycombinator.com | 20 Nov 2022
Since the protocol appears to use adhoc synchronization, the authors might be interested in https://github.com/sipa/minisketch/ which is a library that implements a data structure (pinsketch) that allows two parties to synchronize their sets of m b-bit elements which differ by c entries using only b*c bits. A naive protocol would use m*b bits instead, which is potentially much larger.
I'd guess that under normal usage the message densities probably don't justify such efficient means-- we developed this library for use in bitcoin targeting rates on the order of a dozen new messages per second and where every participant has many peers with potentially differing sets--, but it's still probably worth being aware of. The pinsketch is always equal or more efficient than a naive approach, but may not be worth the complexity.
The somewhat better known IBLT data structure has constant overheads that make it less efficient than even naive synchronization until the set differences are fairly large (particular when the element hashes are small); so some applications that evaluated and eschewed IBLT might find pinsketch applicable.
Ask HN: What are some 'cool' but obscure data structures you know about?
Here is one not on the list so far:
Set Sketches. They allow you compute the difference between two sets (for example to see if data has been replicated between two nodes) WITHOUT transmitting all the keys in one set to another.
Say you have two sets of the numbers [1, ..., 1million] all 32 bit integers, and you know one set is missing 2 random numbers. Set sketches allow you to send a "set checksum" that is only 64 BITS which allows the other party to compute those missing numbers. A naive algorithm would require 1MB of data be transferred to calculate the same thing.
*(in particular pin sketch https://github.com/sipa/minisketch).
How about a pinsketch:
A pinsketch is a set that takes a specified amount of memory and into which you can insert and remove set members or even add whole sets in time O(memory size). You can insert an unbounded number of entries, and at any time that it has equal or fewer entries than the size you can decode the list of members.
For an example usage, say I have a list of ten million IP addresses of people who have DOS attacked my systems recently. I want to send my list to you over an expensive iridium connection, so I don't want to just send the 40MiB list. Fortunately you've been making your own observations (and maybe have stale data from me), and we don't expect our lists to differ by more than 1000 entries. So I make and maintain a pinsketch with size 1000 which takes 4000 bytes (1000 * 4bytes because IP addresses are 32-bits). Then to send you an update I just send it over. You maintain your own pinsketch of addresses, you subtract it from the one I sent and then you decode it. If the number of entries different between us is under 1000 you're guaranteed to learn the difference (otherwise the decode will fail, or give a false decode with odds ~= 1/2^(1000)).
Bonus: We don't need to know in advance how different our sets are-- I can send the sketch in units as small as one word at a time (32-bits in this case) and stop sending once you've got enough to decode.
Here is an implementation I contributed to: https://github.com/sipa/minisketch/
There is a application related data-structure called an inverted bloom lookup table (IBLT) that accomplishes the same task. Its encoding and especially decoding is much faster, and it has asymptotically the same communications efficiency. However, the constant factors on the communications efficiency are poor so for 'small' in set difference (like the 1000 above) it has a rather high overhead factor, and it can't guarantee decoding. I think that makes it much less magical, though it may be the right tool for some applications.
IBLT also has the benefit that it the decoder is a fun bit of code golf to implement.
I love the set reconciliation structures like the IBLT (Iterative Bloom Lookup Table) and BCH set digests like minisketch.
Lets say you have a set of a billion items. Someone else has mostly the same set but they differ by 10 items. These let you exchange messages that would fit in one UDP packet to reconcile the sets.
Here is how Ethereum COULD scale without increasing centralisation and without depending on layer two's.
2 projects | reddit.com/r/CryptoTechnology | 27 Jan 2022
Sipa is working on a better version of that for a while. The technical term is a "set reconciliation protocol", but Bitcoin Core been doing a more basic version of this for a while. Note that the "BCH" there isn't the same as Bcash
ish: Sketches for Zig
3 projects | reddit.com/r/Zig | 18 Dec 2021
I'd also have to say that Zig is a pretty neat library for this. In order to implement PBS I needed the MiniSketch-library (written in C/C++) and I'll have to say that integrating with it has been a breeze. Some fiddling in build.zig so that I can avoid Makefile, and after that everything has worked amazingly.
The Pinecone Overlay Network
2 projects | news.ycombinator.com | 7 May 2021
Networks that need to constrain themselves to limited typologies to avoid traffic magnification do so at the expense of robustness, especially against active attackers that grind their identifiers to gain privileged positions.
Maybe this is a space where efficient reconciliation ( https://github.com/sipa/minisketch/ ) could help-- certainly if the goal were to flood messages to participants reconciliation can give almost optimal communication without compromising robustness.
A note from our sponsor - InfluxDB
www.influxdata.com | 22 Mar 2023
sipa/minisketch is an open source project licensed under MIT License which is an OSI approved license.