polyleven vs RapidFuzz

polyleven

Fast Levenshtein Distance Library for Python 3 (by fujimotos)

levenshtein-distance

Source Code

ceptord.net

Suggest alternative

Edit details

RapidFuzz

Rapid fuzzy string matching in Python using various string metrics (by rapidfuzz)

string-matching string-similarity string-comparison Levenshtein Python CPP levenshtein-distance

Source Code

rapidfuzz.github.io

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

polyleven		RapidFuzz
	Project
1	Mentions	11
76	Stars	2,362
-	Growth	2.4%
10.0	Activity	9.2
over 1 year ago	Latest Commit	13 days ago
C	Language	C++
MIT License	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

polyleven

Posts with mentions or reviews of polyleven. We have used some of these posts to build our list of alternatives and similar projects.

Spellcheck and Levenshtein distance
1 project | /r/learnmachinelearning | 15 Nov 2022

polyleven is the fastest Levenshtein distance library I've been able to find. It also has a threshold parameter which can be used to speed up the calculations. That being said, I've had a lot more success speeding up the processing of large text datasets by converting the words to a vector space (using e.g. word2vec) then calculating euclidean distance, which is much faster than calculating Levenshtein distance (assuming you are using vectorized operations). The fastest solution would probably be to use approximate nearest neighbor search (see for example the faiss library), but again you'll have to embed your words in a vector space and you'll need to decide if this is viable for your use case.

RapidFuzz

Posts with mentions or reviews of RapidFuzz. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-03-13.

RapidFuzz: Rapid fuzzy string matching in Python
1 project | news.ycombinator.com | 14 Feb 2024
OVOS migration with docker containers ...
5 projects | /r/Mycroftai | 13 Mar 2023

Tried it, but it fails here: RUN pip3 install git+https://github.com/maxbachmann/RapidFuzz
Map columns from 2 data sources when colums are named differently
1 project | /r/learnpython | 31 Jan 2023

RapidFuzz has been the most promising fuzzy matcher in my findings with .cdist()
finding common strings
1 project | /r/Python | 21 Jan 2023

RapidFuzz is a faster implementation.
Pandas: How can I check if a DataFrame is a subset of another DataFrame? Ideal scenario would be to identify a match percentage instead of requiring an exact match
1 project | /r/learnpython | 11 Oct 2022

For fuzzy matching - there's Rapidfuzz.
What packages replaced standard library modules in your workflow?
6 projects | /r/Python | 2 Sep 2022
Fuzzy search
3 projects | /r/learnpython | 17 May 2022

There is also https://github.com/maxbachmann/RapidFuzz which uses the MIT license.
can i use concurrent for this or is there a better way
1 project | /r/learnpython | 6 Feb 2022
Finding the distance between two sentences that that share mostly the same words.
3 projects | /r/LanguageTechnology | 16 Mar 2021

RapidFuzz
Can you extract indexes of data over a threshold from numpy array or pandas dataframe?
1 project | /r/learnpython | 26 Feb 2021

What are some alternatives?

When comparing polyleven and RapidFuzz you can also consider the following projects:

distlib - Distance related functions (Damerau-Levenshtein, Jaro-Winkler , longest common substring & subsequence) implemented as SQLite run-time loadable extension. Any UTF-8 strings are supported.

PolyFuzz - Fuzzy string matching, grouping, and evaluation.

SymSpell - SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

fuzzywuzzy - Fuzzy String Matching in Python

Java String Similarity - Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...

string_grouper - Super Fast String Matching in Python

lev - Levenshtein distance function as C Extension for Python 3

strutil-go - Golang metrics for calculating string similarity and other string utility functions

go-edlib - 📚 String comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...

OpenBBTerminal - Investment Research for Everyone, Everywhere.

thefuzz - Fuzzy String Matching in Python

PyRFC - Asynchronous, non-blocking SAP NW RFC SDK bindings for Python