The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 21 Levenshtein Open-Source Projects
-
TextDistance
📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
-
SymSpell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
go-edlib
📚 String comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...
-
js-levenshtein
The most efficient JS implementation calculating the Levenshtein distance, i.e. the difference between two strings.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
strutil-go
Golang metrics for calculating string similarity and other string utility functions (by adrg)
-
Quickenshtein
Making the quickest and most memory efficient implementation of Levenshtein Distance with SIMD and Threading support
-
textdistance.rs
🦀📏 Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Should you combine edit distance "spell check" algorithms with phonetic matching algorithms for robust keyword finding? | /r/AskComputerScience | 2023-11-07The SimSpell algorithm uses deletions to determine edit distance of the input query word compared to a dictionary of correctly spelled words. The Double Metaphone algorithm (or other phonetic algorithms) convert the words to phonetic versions (phonetic "hashes" basically), and you then search based on the input phonetic hash matching the dictionary of phonetic hashes.
Project mention: RapidFuzz: Rapid fuzzy string matching in Python | news.ycombinator.com | 2024-02-14
For sounds something like https://github.com/jamesturk/jellyfish ?
import { ratio, } from 'fuzzball'; import { SequenceMatcher } from 'difflib'; // modified from: https://github.com/nol13/fuzzball.js/blob/773b82991f2bcacc950b413615802aa953193423/fuzzball.js#L942 function partial_ratio(str1: string, str2: string) { if (str1.length <= str2.length) { var shorter = str1 var longer = str2 } else { var shorter = str2 var longer = str1 } var m = new SequenceMatcher(null, shorter, longer); var blocks = m.getMatchingBlocks(); let bestScore: number = 0; let bestMatch: string | null = null let bestStartIdx: number = -1 for (var b = 0; b < blocks.length; b++) { var long_start = (blocks[b][1] - blocks[b][0]) > 0 ? (blocks[b][1] - blocks[b][0]) : 0; var long_end = long_start + shorter.length; var long_substr = longer.substring(long_start,long_end); var r = ratio(shorter,long_substr); if (r > bestScore) { bestScore = r; bestMatch = long_substr; bestStartIdx = long_start; } if (r > 99.5) { break; } } return { bestMatch, bestScore, bestStartIdx, } }
Project mention: textdistance.rs: Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support. Based on popular and battle-tested textdistance Python library. | /r/rust | 2023-05-19The wide selection of algorithms is great, but some preliminary testing shows that this library's implementations are quite slower than the already existing implementations, e.g strsim.
For Postgres, there is an extension that provides that.
Project mention: textdistance.rs: Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support. Based on popular and battle-tested textdistance Python library. | /r/rust | 2023-05-19
Project mention: Calculate the difference and intersection of any two regexes | news.ycombinator.com | 2023-09-11Say you want to compute all strings of length 5 that the automaton can generate. Conceptually the nicest way is to create an automaton that matches any five characters and then compute the intersection between that automaton and the regex automaton. Then you can generate all the strings in the intersection automaton. Of course, IRL, you wouldn't actually generate the intersection (you can easily do this on the fly), but you get the idea.
Automata are really a lost art in modern natural language processing. We used to do things like store a large vocabulary in an deterministic acyclic minimized automaton (nice and compact, so-called dictionary automaton). And then to find, say all words within Levenshtein distance 2 of hacker, create a Levenshtein automaton for hacker and then compute (on the fly) the intersection between the Levenshtein automaton and the dictionary automaton. The language of the automaton is then all words within the intersection automaton.
I wrote a Java package a decade ago that implements some of this stuff:
https://github.com/danieldk/dictomaton
Project mention: A tool for detecting plagiarism of Github repositories in someone else's code | /r/recruitinghell | 2023-09-27
Levenshtein related posts
- RapidFuzz: Rapid fuzzy string matching in Python
- Unlocking Advanced RAG: Citations and Attributions
- Should you combine edit distance "spell check" algorithms with phonetic matching algorithms for robust keyword finding?
- A tool for detecting plagiarism of Github repositories in someone else's code
- I built a tool for hiring managers to find if a technical assignment is plagiarised from Github
- A command line tool for finding plagiarisms of technical assignments in Github
- My first ever project in Golang - Hercules: command line tool for checking plagiarism of technical assignments from Github
-
A note from our sponsor - WorkOS
workos.com | 23 Apr 2024
Index
What are some of the best open-source Levenshtein projects? This list will help you:
Project | Stars | |
---|---|---|
1 | TextDistance | 3,296 |
2 | SymSpell | 3,034 |
3 | RapidFuzz | 2,338 |
4 | jellyfish | 1,989 |
5 | fuzzball.js | 502 |
6 | go-edlib | 444 |
7 | js-levenshtein | 428 |
8 | closestmatch | 416 |
9 | strsim-rs | 375 |
10 | pg_similarity | 352 |
11 | levenshtein | 319 |
12 | strutil-go | 274 |
13 | Quickenshtein | 272 |
14 | textdistance.rs | 254 |
15 | didyoumean | 202 |
16 | abydos | 167 |
17 | dictomaton | 130 |
18 | simetric | 60 |
19 | hercules | 19 |
20 | edits.cr | 16 |
21 | Edits | 2 |
Sponsored