jellyfish
fuzzywuzzy
Our great sponsors
jellyfish | fuzzywuzzy | |
---|---|---|
3 | 20 | |
1,979 | 9,067 | |
- | 0.2% | |
6.9 | 0.0 | |
25 days ago | about 1 year ago | |
Jupyter Notebook | Python | |
MIT License | GNU General Public License v2.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
jellyfish
-
Comparing Strings (Street Names) With Machine Learning
When comparing strings (in our case street names), there are plenty of off-the-shelf features that can be used, such as those provided by the jellyfish. This package also provides a number of phonetic encodings. We can combine an encoding with a metric, such as Levenshtein Distance, to measure the phonetic similarity between two street names.
fuzzywuzzy
-
Thanks to this sub, we now have an Anki deck for Persona 5 Royal. Spreadsheet with Jp and Eng side by side too.
Convert the original lines to full furigana and do a fuzzy match. (For reference, the original line is 貴方がこれまでに得てきた力、存分に発揮してくださいね。) You can do a regional search using the initial scene data (E60) first, and if the confidence is low, go for a slower full search.
-
Fuzzy search
It's now known as "thefuzz", see https://github.com/seatgeek/fuzzywuzzy
-
import fuzzywuzzy
fuzzywuzzy is actually just called the thefuzz now.
-
The Levenshtein Distance in Production
I used fuzzywuzzy [1], a python-based fuzzy string matching calculator that is based on Levenshtein's edit-distance for a parking enforcement product I built.
The product used an iOS client to capture license plates. The app would capture a single plate many times, de-duplicate using an edit-distance threshold matching plates up to a time period lookback. Then among those plates, send the one with the highest likely correct read up to the server.
The web app would look to see if that plate had been seen before or a plate similar had been seen before. If so, it would join a "plate group" which let you see the history of vehicle sightings.
A set of rules could be created to show vehicle in violation of a posted parking time limit. It worked when a person on patrol walked the lot twice and the app saw the "same" vehicle (thanks to edit distance)
I had Cummings Properties in New England and National Cathedral in DC as customers, but sold the product last year.
I wrote some about Vladimir Losifovich Levenshtein in a final blog entry high level blog entry about edit distance. There's a 2002 photo of Levenshtein at a conference if you're curious. [2]
-
[D] String matching
seatgeek/fuzzywuzzy: Fuzzy String Matching in Python (github.com)
What are some alternatives?
thefuzz - Fuzzy String Matching in Python
Levenshtein - The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
TextDistance - 📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
ftfy - Fixes mojibake and other glitches in Unicode text, after the fact.
chardet - Python character encoding detector
RapidFuzz - Rapid fuzzy string matching in Python using various string metrics
pyfiglet - An implementation of figlet written in Python
汉字拼音转换工具(Python 版) - 汉字转拼音(pypinyin)
Pygments
gensim - Topic Modelling for Humans