Our great sponsors
-
normalize-for-search
Un-accents and un-umlauts characters in a string. Also preliminary converts the string to lower case. We use it for autocomplete: both for the matched strings -- on the server side, when indexing; and for the strings the user types into a text input in the browser.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
It's definitely not finished but if you're a coding type the Github repo is here. If you want to play with the extremely early version you can invite it to your Discord server here but it's not online all the time or finished so please consider any use of it a test!
You could probably use this project as a good starting point for a list of accent character conversions: https://github.com/ikr/normalize-for-search
Your best bet is to start using a proper search library rather than the simple loop with 'in' checks that you have now. A search lib will handle things like Unicode/ASCII similarities, removal of stop words, stemming, TF-IDF (and other) weighting, etc. and will be massively faster as well. Quite a few pages come up if you Google "python search engine", also Whoosh looks promising.