polyleven
RapidFuzz
polyleven | RapidFuzz | |
---|---|---|
1 | 11 | |
76 | 2,362 | |
- | 2.4% | |
10.0 | 9.2 | |
over 1 year ago | 13 days ago | |
C | C++ | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
polyleven
-
Spellcheck and Levenshtein distance
polyleven is the fastest Levenshtein distance library I've been able to find. It also has a threshold parameter which can be used to speed up the calculations. That being said, I've had a lot more success speeding up the processing of large text datasets by converting the words to a vector space (using e.g. word2vec) then calculating euclidean distance, which is much faster than calculating Levenshtein distance (assuming you are using vectorized operations). The fastest solution would probably be to use approximate nearest neighbor search (see for example the faiss library), but again you'll have to embed your words in a vector space and you'll need to decide if this is viable for your use case.
RapidFuzz
- RapidFuzz: Rapid fuzzy string matching in Python
-
OVOS migration with docker containers ...
Tried it, but it fails here: RUN pip3 install git+https://github.com/maxbachmann/RapidFuzz
-
Map columns from 2 data sources when colums are named differently
RapidFuzz has been the most promising fuzzy matcher in my findings with .cdist()
-
finding common strings
RapidFuzz is a faster implementation.
-
Pandas: How can I check if a DataFrame is a subset of another DataFrame? Ideal scenario would be to identify a match percentage instead of requiring an exact match
For fuzzy matching - there's Rapidfuzz.
- What packages replaced standard library modules in your workflow?
-
Fuzzy search
There is also https://github.com/maxbachmann/RapidFuzz which uses the MIT license.
- can i use concurrent for this or is there a better way
-
Finding the distance between two sentences that that share mostly the same words.
RapidFuzz
- Can you extract indexes of data over a threshold from numpy array or pandas dataframe?
What are some alternatives?
distlib - Distance related functions (Damerau-Levenshtein, Jaro-Winkler , longest common substring & subsequence) implemented as SQLite run-time loadable extension. Any UTF-8 strings are supported.
PolyFuzz - Fuzzy string matching, grouping, and evaluation.
SymSpell - SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
fuzzywuzzy - Fuzzy String Matching in Python
Java String Similarity - Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...
string_grouper - Super Fast String Matching in Python
lev - Levenshtein distance function as C Extension for Python 3
strutil-go - Golang metrics for calculating string similarity and other string utility functions
go-edlib - 📚 String comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...
OpenBBTerminal - Investment Research for Everyone, Everywhere.
thefuzz - Fuzzy String Matching in Python
PyRFC - Asynchronous, non-blocking SAP NW RFC SDK bindings for Python