Rebuilding the spellchecker, pt.4: Introduction to suggest algorithm

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

JamSpell

3 591 2.4 C++

Modern spell checking library - accurate, fast, multi-language

There is, for example, a curious evaluation table provided by a modern ML-based spellchecker JamSpell. According to it, JamSpell is awesome—while Hunspell is a mere 0.03% better than dummy ("fix nothing") spellchecker... Which doesn't ring true, somehow!

hunspell

20 2,001 7.9 C++

The most popular spellchecking library.

Those questions are open ones—and even the way they can be answered is unclear. Intuitively, Hunspell's suggestions are quite decent—otherwise, it wouldn't be the most widespread spellchecker, after all. A fair amount of "unhappy customers" can be easily found, too, in hunspell's repo issues. At the same time, one should distinguish between different reasons for the sub-par suggestion quality. It might be due to the algorithm itself, or due to the source data quality: the literal absence of the desired suggestion in the dictionary, or lack of aff-file settings that could've guided Hunspell to finding it.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
SymSpell

16 3,034 6.0 C#

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Some of the modern approaches to spellchecking still take this road: for example, SymSpell algorithm (claiming to be "1 million times faster") is at its core just a brilliant idea for a novel storage format for a flat word list, that allows optimizing the calculation of edit distance significantly.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project