Open-source projects categorized as locality-sensitive-hashing Edit details
Language filter: + Python + C++ + C# + Scala

Top 6 locality-sensitive-hashing Open-Source Projects

  • annoy

    Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

    Project mention: Bitmap Indexes in Go: Search Speed | news.ycombinator.com | 2022-09-22

    Ducks, the story:

    I was using Python in-memory vector search engine called Annoy [1] to do semantic search on various kinds of data. It worked great for finding "similar" objects. Story A has similar text to story B, image A looks like image B, etc.

    But doing basic metadata lookups was surprisingly hard. How do I get all images matching some criteria (say, size range, or tags)? I'd have to serialize them all into a DB, and use a DB index. Databases are great, but they add code bloat and overhead; I'm usually working Jupyter notebooks and I like keeping as few external dependencies as possible.

    So I wrote ducks as a quick, convenient way to index anything.

    There's lots of other usage patterns of course, it's very generic. It makes a great Wordle / crossword solver too. "Find me words where the first letter is A and the fifth letter is L" is very fast in ducks.

    Indexing is just one of those things you always need. Python didn't have a good way to do it, and now it does!

    [1] https://github.com/spotify/annoy

  • datasketch

    MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble

  • talent.io

    Download talent.io’s Tech Salary Report. Median salaries, most in-demand technologies, state of the remote work... all you need to know your worth on the market by tech recruitment platform talent.io

  • soundfingerprinting

    Open source audio fingerprinting in .NET. An efficient algorithm for acoustic fingerprinting written purely in C#.

    Project mention: [P] Is it feasible to find a mapping between two non-synthesized audio signals of the same audio sequence? | reddit.com/r/MachineLearning | 2022-08-21
  • elastiknn

    Elasticsearch plugin and Lucene library for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.

  • image-ndd-lsh

    Near-duplicate image detection using Locality Sensitive Hashing

  • dedup

    Find duplicate text files.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-09-22.

locality-sensitive-hashing related posts


What are some of the best open-source locality-sensitive-hashing projects? This list will help you:

Project Stars
1 annoy 10,329
2 datasketch 1,804
3 soundfingerprinting 781
4 elastiknn 281
5 image-ndd-lsh 42
6 dedup 7
Find remote jobs at our new job board 99remotejobs.com. There are 8 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Build time-series-based applications quickly and at scale.
InfluxDB is the Time Series Data Platform where developers build real-time applications for analytics, IoT and cloud-native services in less time with less code.