Top 23 fuzzy-matching Open-Source Projects

TNTSearch

4 3,035 6.8 PHP

A fully featured full text search engine written in PHP
SymSpell

16 3,034 6.0 C#

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Project mention: Should you combine edit distance "spell check" algorithms with phonetic matching algorithms for robust keyword finding? | /r/AskComputerScience | 2023-11-07

The SimSpell algorithm uses deletions to determine edit distance of the input query word compared to a dictionary of correctly spelled words. The Double Metaphone algorithm (or other phonetic algorithms) convert the words to phonetic versions (phonetic "hashes" basically), and you then search based on the input phonetic hash matching the dictionary of phonetic hashes.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
uFuzzy

16 2,498 7.5 JavaScript

A tiny, efficient fuzzy search that doesn't suck

Project mention: Show HN: A fast, accurate and multilingual fuzzy search lib for the front end | news.ycombinator.com | 2024-02-14

Thank you. We need more libs like that. I just researched the field yesterday and https://github.com/leeoniya/uFuzzy looked pretty good. But there is a gap in the market of such libs. Just few allow to send the whole html document, serialize and deserialize index to be used in browser, highlighting the matches is desired feature.
Most importantly very few fuzzy search libs can get a simple substring match as a priority, which is understandable but not helpful. Imagine searching for “xample” and not having “example” among the results.

LeaderF

7 2,096 8.6 Python

An efficient fuzzy finder that helps to locate files, buffers, mrus, gtags, etc. on the fly for both vim and neovim.

Project mention: You can use `Leaderf rg --live` to perform live grep now. | /r/neovim | 2023-07-29

splink

16 1,076 9.9 Python

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

Project mention: Splink: Fast, accurate, scalable probabilistic data linkage | news.ycombinator.com | 2024-03-13

zingg

23 877 9.3 Java

Scalable identity resolution, entity resolution, data mastering and deduplication using ML
fzf-for-js

7 871 3.4 TypeScript

Do fuzzy matching using FZF algorithm in JavaScript
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
fuzzball.js

2 502 4.5 JavaScript

Easy to use and powerful fuzzy string matching, port of fuzzywuzzy.

Project mention: Unlocking Advanced RAG: Citations and Attributions | dev.to | 2024-01-29

import { ratio, } from 'fuzzball'; import { SequenceMatcher } from 'difflib'; // modified from: https://github.com/nol13/fuzzball.js/blob/773b82991f2bcacc950b413615802aa953193423/fuzzball.js#L942 function partial_ratio(str1: string, str2: string) { if (str1.length <= str2.length) { var shorter = str1 var longer = str2 } else { var shorter = str2 var longer = str1 } var m = new SequenceMatcher(null, shorter, longer); var blocks = m.getMatchingBlocks(); let bestScore: number = 0; let bestMatch: string | null = null let bestStartIdx: number = -1 for (var b = 0; b < blocks.length; b++) { var long_start = (blocks[b][1] - blocks[b][0]) > 0 ? (blocks[b][1] - blocks[b][0]) : 0; var long_end = long_start + shorter.length; var long_substr = longer.substring(long_start,long_end); var r = ratio(shorter,long_substr); if (r > bestScore) { bestScore = r; bestMatch = long_substr; bestStartIdx = long_start; } if (r > 99.5) { break; } } return { bestMatch, bestScore, bestStartIdx, } }

RE-flex

3 483 7.5 C++

A high-performance C++ regex library and lexical analyzer generator with Unicode support. Extends Flex++ with Unicode support, indent/dedent anchors, lazy quantifiers, functions for lex and syntax error reporting and more. Seamlessly integrates with Bison and other parsers. (by Genivia)

Project mention: RE/flex 3.3.4 released - a scanner generator for C++ | /r/Compilers | 2023-05-31

closestmatch

1 416 10.0 Go

Golang library for fuzzy matching within a set of strings :page_with_curl:
fuzzysearch

1 280 0.0 Python

Find parts of long text or data, allowing for some changes/typos. (by taleinat)
textdistance.rs

1 254 7.8 Rust

🦀📏 Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support.

Project mention: textdistance.rs: Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support. Based on popular and battle-tested textdistance Python library. | /r/rust | 2023-05-19

dolos

1 207 9.7 TypeScript

:detective: Source code plagiarism detection
abydos

1 167 0.0 Python

Abydos NLP/IR library for Python
bolt.nvim

0 107 1.8 Python

⚡ Ultrafast multi-pane file manager for Neovim with fuzzy matching
fzshell

1 73 4.7 Go

Fuzzy shell completions you didn't know you needed
unisim

1 63 8.1 Python

UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.

Project mention: Google UniSim for efficient similarity computation | news.ycombinator.com | 2023-11-30

Yoyo-leaf

1 53 3.8 C++

Yoyo-leaf is an awesome command-line fuzzy finder.
ZLOOKUP

1 25 0.0 JavaScript

Google Sheet Fuzzy String Matching Function
cargo-select

2 16 0.0 Rust

Cargo subcommand to easily run targets/examples
fuzzyset

0 9 7.6 Haskell

:sheep: A fuzzy string set implementation in Haskell.
fuzzy-item-matching

1 4 2.2 Python

Use machine learning and the Databricks Lakehouse Platform for product matching that can be used by marketplaces and suppliers for various purposes. Resolve differences between product definitions and descriptions and determine which items are likely pairs and which are distinct across disparate data sets.
FallGuysNameFinder

5 4 0.0 C#

Automates Fall Guys Name Rerolling
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

fuzzy-matching related posts

Unlocking Advanced RAG: Citations and Attributions
1 project | dev.to | 29 Jan 2024
You can use `Leaderf rg --live` to perform live grep now.
1 project | /r/neovim | 29 Jul 2023
textdistance.rs: Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support. Based on popular and battle-tested textdistance Python library.
2 projects | /r/rust | 19 May 2023
Sotd.fun is live
3 projects | /r/Wetshaving | 5 Feb 2023
uFuzzy 1.0 - A tiny, efficient fuzzy search that doesn't suck
1 project | /r/javascript | 23 Jan 2023
How do I get :Telescope find_files to only search in current working directory?
1 project | /r/neovim | 19 Dec 2022
uFuzzy.js – A tiny, efficient fuzzy search that doesn't suck
2 projects | /r/javascript | 2 Oct 2022
A note from our sponsor - SaaSHub
www.saashub.com | 24 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source fuzzy-matching projects? This list will help you:

	Project	Stars
1	TNTSearch	3,035
2	SymSpell	3,034
3	uFuzzy	2,498
4	LeaderF	2,096
5	splink	1,076
6	zingg	877
7	fzf-for-js	871
8	fuzzball.js	502
9	RE-flex	483
10	closestmatch	416
11	fuzzysearch	280
12	textdistance.rs	254
13	dolos	207
14	abydos	167
15	bolt.nvim	107
16	fzshell	73
17	unisim	63
18	Yoyo-leaf	53
19	ZLOOKUP	25
20	cargo-select	16
21	fuzzyset	9
22	fuzzy-item-matching	4
23	FallGuysNameFinder	4