Levenshtein

Top 21 Levenshtein Open-Source Projects

  • TextDistance

    📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

  • SymSpell

    SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

  • Project mention: Should you combine edit distance "spell check" algorithms with phonetic matching algorithms for robust keyword finding? | /r/AskComputerScience | 2023-11-07

    The SimSpell algorithm uses deletions to determine edit distance of the input query word compared to a dictionary of correctly spelled words. The Double Metaphone algorithm (or other phonetic algorithms) convert the words to phonetic versions (phonetic "hashes" basically), and you then search based on the input phonetic hash matching the dictionary of phonetic hashes.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • RapidFuzz

    Rapid fuzzy string matching in Python using various string metrics

  • Project mention: RapidFuzz: Rapid fuzzy string matching in Python | news.ycombinator.com | 2024-02-14
  • jellyfish

    🪼 a python library for doing approximate and phonetic matching of strings.

  • Project mention: Python Libraries | /r/learnpython | 2023-05-30

    For sounds something like https://github.com/jamesturk/jellyfish ?

  • fuzzball.js

    Easy to use and powerful fuzzy string matching, port of fuzzywuzzy.

  • Project mention: Unlocking Advanced RAG: Citations and Attributions | dev.to | 2024-01-29

    import { ratio, } from 'fuzzball'; import { SequenceMatcher } from 'difflib'; // modified from: https://github.com/nol13/fuzzball.js/blob/773b82991f2bcacc950b413615802aa953193423/fuzzball.js#L942 function partial_ratio(str1: string, str2: string) { if (str1.length <= str2.length) { var shorter = str1 var longer = str2 } else { var shorter = str2 var longer = str1 } var m = new SequenceMatcher(null, shorter, longer); var blocks = m.getMatchingBlocks(); let bestScore: number = 0; let bestMatch: string | null = null let bestStartIdx: number = -1 for (var b = 0; b < blocks.length; b++) { var long_start = (blocks[b][1] - blocks[b][0]) > 0 ? (blocks[b][1] - blocks[b][0]) : 0; var long_end = long_start + shorter.length; var long_substr = longer.substring(long_start,long_end); var r = ratio(shorter,long_substr); if (r > bestScore) { bestScore = r; bestMatch = long_substr; bestStartIdx = long_start; } if (r > 99.5) { break; } } return { bestMatch, bestScore, bestStartIdx, } }

  • go-edlib

    📚 String comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...

  • js-levenshtein

    The most efficient JS implementation calculating the Levenshtein distance, i.e. the difference between two strings.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • closestmatch

    Golang library for fuzzy matching within a set of strings :page_with_curl:

  • strsim-rs

    :abc: Rust implementations of string similarity metrics

  • Project mention: textdistance.rs: Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support. Based on popular and battle-tested textdistance Python library. | /r/rust | 2023-05-19

    The wide selection of algorithms is great, but some preliminary testing shows that this library's implementations are quite slower than the already existing implementations, e.g strsim.

  • pg_similarity

    set of functions and operators for executing similarity queries

  • Project mention: Data Cleaning in SQL | /r/SQL | 2023-06-15

    For Postgres, there is an extension that provides that.

  • levenshtein

    Go implementation to calculate Levenshtein Distance.

  • strutil-go

    Golang metrics for calculating string similarity and other string utility functions (by adrg)

  • Quickenshtein

    Making the quickest and most memory efficient implementation of Levenshtein Distance with SIMD and Threading support

  • textdistance.rs

    🦀📏 Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support.

  • Project mention: textdistance.rs: Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support. Based on popular and battle-tested textdistance Python library. | /r/rust | 2023-05-19
  • didyoumean

    A CLI spelling corrector for when you're unsure

  • abydos

    Abydos NLP/IR library for Python

  • dictomaton

    Finite state dictionaries in Java

  • Project mention: Calculate the difference and intersection of any two regexes | news.ycombinator.com | 2023-09-11

    Say you want to compute all strings of length 5 that the automaton can generate. Conceptually the nicest way is to create an automaton that matches any five characters and then compute the intersection between that automaton and the regex automaton. Then you can generate all the strings in the intersection automaton. Of course, IRL, you wouldn't actually generate the intersection (you can easily do this on the fly), but you get the idea.

    Automata are really a lost art in modern natural language processing. We used to do things like store a large vocabulary in an deterministic acyclic minimized automaton (nice and compact, so-called dictionary automaton). And then to find, say all words within Levenshtein distance 2 of hacker, create a Levenshtein automaton for hacker and then compute (on the fly) the intersection between the Levenshtein automaton and the dictionary automaton. The language of the automaton is then all words within the intersection automaton.

    I wrote a Java package a decade ago that implements some of this stuff:

    https://github.com/danieldk/dictomaton

  • simetric

    String similarity metrics for Elixir

  • hercules

    Detect plagiarism of Github repositories in someone else's code (by ongteckwu)

  • Project mention: A tool for detecting plagiarism of Github repositories in someone else's code | /r/recruitinghell | 2023-09-27
  • edits.cr

    Edit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment

  • Edits

    Edit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Levenshtein related posts

Index

What are some of the best open-source Levenshtein projects? This list will help you:

Project Stars
1 TextDistance 3,296
2 SymSpell 3,034
3 RapidFuzz 2,338
4 jellyfish 1,989
5 fuzzball.js 502
6 go-edlib 444
7 js-levenshtein 428
8 closestmatch 416
9 strsim-rs 375
10 pg_similarity 352
11 levenshtein 319
12 strutil-go 274
13 Quickenshtein 272
14 textdistance.rs 254
15 didyoumean 202
16 abydos 167
17 dictomaton 130
18 simetric 60
19 hercules 19
20 edits.cr 16
21 Edits 2

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com