Rebuilding the most popular spellchecker. Part 1

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • WeCantSpell.Hunspell

    A port of Hunspell v1 for .NET and .NET Standard

  • Note that there are also a few "pragmatic" ports of Hunspell into other languages (in order to use it in environments where C++ dependency is undesireable), namely WeCantSpell.Hunspell in C# and nspell in JS (very incomplete); and aforementioned nuspell can also be considered a "port" (from legacy C++ to a modern one).

  • JamSpell

    Modern spell checking library - accurate, fast, multi-language

  • Obviously, there are open-source spellcheckers other than Hunspell. GNU aspell (that at one point was superseded by Hunspell, but still holds its ground in English suggestion quality), to name one of the older ones; but also there are novel approaches, like SymSpell, claiming to be "1 million times faster" or ML-based JamSpell, claiming to be much more accurate.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • hunspell

    The most popular spellchecking library.

  • Currently, Hunspell is maintained on GitHub (repo has only around 1k stars, will you believe it?). It seems that maintenance is not that easy if you'll weight the number of open issues and PRs, and the latest commits timeline: at the time of writing it (Jan 2021), the last commit to master was of May 2020, and the last release was 1.7 on Dec 2018. Hunspell's codebase is mostly "old-school" C++. It is being slowly modernized and it has very few comments; there are thousands of two-branch ifs to handle non-Unicode and Unicode text separately. There is also an attempt to rewrite Hunspell from scratch in a modern C++, which at some point was developed under the hunspell GitHub organization. Now it is independent and called nuspell (and, while not yet supporting all of the Hunspell features, already "achieved" version 4.2.0).

  • spylls

    Pure Python spell-checker, (almost) full port of Hunspell

  • Currently, Spylls has ≈1.5k lines of library code in 14 files. It conforms (with some reservations) to all Hunspell's integrational tests. Those tests look like a set of files each, consisting of "test dictionary + what words should be considered good, what words should be considered bad, what should be suggested instead of the bad words", and there are 127 of such sets to pass. There are 2 thousand comment lines in the code, explaining thoroughly every detail of the algorithm and rendered at the Spylls documentation site; note that besides docstrings at the beginning of each class and method, there are also inline comments in code—that's why the documentation site uses custom theme with inline "Show code" feature.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Is GNU Aspell the best spell checker for emacs on macOS?

    8 projects | /r/emacs | 23 May 2023
  • Why don't common browsers use Soundex for spelling suggestions?

    1 project | /r/browsers | 6 Mar 2023
  • Does anyone know how to change the dictionary that W10 pulls from? Ideally replace with Google's brain?

    1 project | /r/techsupport | 13 Feb 2023
  • Autocorrect anything with Google as a go-to spell check

    5 projects | /r/AutoHotkey | 5 Feb 2023
  • Spell checker

    1 project | /r/LaTeX | 17 Jan 2023