SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 fuzzy-matching Open-Source Projects
-
SymSpell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
LeaderF
An efficient fuzzy finder that helps to locate files, buffers, mrus, gtags, etc. on the fly for both vim and neovim.
-
splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
RE-flex
A high-performance C++ regex library and lexical analyzer generator with Unicode support. Extends Flex++ with Unicode support, indent/dedent anchors, lazy quantifiers, functions for lex and syntax error reporting and more. Seamlessly integrates with Bison and other parsers. (by Genivia)
-
textdistance.rs
🦀📏 Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support.
-
unisim
UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.
-
fuzzy-item-matching
Use machine learning and the Databricks Lakehouse Platform for product matching that can be used by marketplaces and suppliers for various purposes. Resolve differences between product definitions and descriptions and determine which items are likely pairs and which are distinct across disparate data sets.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Should you combine edit distance "spell check" algorithms with phonetic matching algorithms for robust keyword finding? | /r/AskComputerScience | 2023-11-07The SimSpell algorithm uses deletions to determine edit distance of the input query word compared to a dictionary of correctly spelled words. The Double Metaphone algorithm (or other phonetic algorithms) convert the words to phonetic versions (phonetic "hashes" basically), and you then search based on the input phonetic hash matching the dictionary of phonetic hashes.
Project mention: Show HN: A fast, accurate and multilingual fuzzy search lib for the front end | news.ycombinator.com | 2024-02-14Thank you. We need more libs like that. I just researched the field yesterday and https://github.com/leeoniya/uFuzzy looked pretty good. But there is a gap in the market of such libs. Just few allow to send the whole html document, serialize and deserialize index to be used in browser, highlighting the matches is desired feature.
Most importantly very few fuzzy search libs can get a simple substring match as a priority, which is understandable but not helpful. Imagine searching for “xample” and not having “example” among the results.
Project mention: Splink: Fast, accurate, scalable probabilistic data linkage | news.ycombinator.com | 2024-03-13
import { ratio, } from 'fuzzball'; import { SequenceMatcher } from 'difflib'; // modified from: https://github.com/nol13/fuzzball.js/blob/773b82991f2bcacc950b413615802aa953193423/fuzzball.js#L942 function partial_ratio(str1: string, str2: string) { if (str1.length <= str2.length) { var shorter = str1 var longer = str2 } else { var shorter = str2 var longer = str1 } var m = new SequenceMatcher(null, shorter, longer); var blocks = m.getMatchingBlocks(); let bestScore: number = 0; let bestMatch: string | null = null let bestStartIdx: number = -1 for (var b = 0; b < blocks.length; b++) { var long_start = (blocks[b][1] - blocks[b][0]) > 0 ? (blocks[b][1] - blocks[b][0]) : 0; var long_end = long_start + shorter.length; var long_substr = longer.substring(long_start,long_end); var r = ratio(shorter,long_substr); if (r > bestScore) { bestScore = r; bestMatch = long_substr; bestStartIdx = long_start; } if (r > 99.5) { break; } } return { bestMatch, bestScore, bestStartIdx, } }
Project mention: textdistance.rs: Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support. Based on popular and battle-tested textdistance Python library. | /r/rust | 2023-05-19
Project mention: Google UniSim for efficient similarity computation | news.ycombinator.com | 2023-11-30
fuzzy-matching related posts
- Unlocking Advanced RAG: Citations and Attributions
- You can use `Leaderf rg --live` to perform live grep now.
- textdistance.rs: Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support. Based on popular and battle-tested textdistance Python library.
- Sotd.fun is live
- uFuzzy 1.0 - A tiny, efficient fuzzy search that doesn't suck
- How do I get :Telescope find_files to only search in current working directory?
- uFuzzy.js – A tiny, efficient fuzzy search that doesn't suck
-
A note from our sponsor - SaaSHub
www.saashub.com | 24 Apr 2024
Index
What are some of the best open-source fuzzy-matching projects? This list will help you:
Project | Stars | |
---|---|---|
1 | TNTSearch | 3,035 |
2 | SymSpell | 3,034 |
3 | uFuzzy | 2,498 |
4 | LeaderF | 2,096 |
5 | splink | 1,076 |
6 | zingg | 877 |
7 | fzf-for-js | 871 |
8 | fuzzball.js | 502 |
9 | RE-flex | 483 |
10 | closestmatch | 416 |
11 | fuzzysearch | 280 |
12 | textdistance.rs | 254 |
13 | dolos | 207 |
14 | abydos | 167 |
15 | bolt.nvim | 107 |
16 | fzshell | 73 |
17 | unisim | 63 |
18 | Yoyo-leaf | 53 |
19 | ZLOOKUP | 25 |
20 | cargo-select | 16 |
21 | fuzzyset | 9 |
22 | fuzzy-item-matching | 4 |
23 | FallGuysNameFinder | 4 |
Sponsored