polyleven
distlib
polyleven | distlib | |
---|---|---|
1 | 2 | |
76 | 20 | |
- | - | |
10.0 | 4.4 | |
over 1 year ago | over 2 years ago | |
C | C | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
polyleven
-
Spellcheck and Levenshtein distance
polyleven is the fastest Levenshtein distance library I've been able to find. It also has a threshold parameter which can be used to speed up the calculations. That being said, I've had a lot more success speeding up the processing of large text datasets by converting the words to a vector space (using e.g. word2vec) then calculating euclidean distance, which is much faster than calculating Levenshtein distance (assuming you are using vectorized operations). The fastest solution would probably be to use approximate nearest neighbor search (see for example the faiss library), but again you'll have to embed your words in a vector space and you'll need to decide if this is viable for your use case.
distlib
-
New run-time loadable extension with distance related functions available
I have just retested it on my Raspberry Pi 400 by downloading the code as ZIP from https://github.com/schiffma/distlib . There should be neither errors nor warnings both on Linux and Windows.
What are some alternatives?
SymSpell - SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
sqlite-wf - Simple visual ETL tool
Java String Similarity - Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...
go-sqlite-lite - SQLite driver for the Go programming language
RapidFuzz - Rapid fuzzy string matching in Python using various string metrics
sqlite-createtable-parser - A parser for SQLite create table sql statements.
lev - Levenshtein distance function as C Extension for Python 3
SQLite3MultipleCiphers - SQLite3 encryption extension with support for multiple ciphers
esp32_arduino_sqlite3_lib - Sqlite3 Arduino library for ESP32
sqlite-gui - Lightweight SQLite editor for Windows