Rebuilding the spellchecker, pt.3: Lookup–compounds and solutions

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • languagetool

    Style and Grammar Checker for 25+ Languages

  • I believe that LanguageTool[0] is the closest open-source counterpart to Grammarly. Though, in my experience, it is not a half as useful... But multilinugal and open-source.

    I have a distant dream of doing to it what I did to Hunspell (write a code/series of articles explaining how it works and why it is so hard), but we'll see.

    For what I know, LanguageTool is based just on a huge set of rules (you can see them in the repo[1]); and Grammarly is a mix of rule-based and machine-learning suggestions (I heard a rumor that it is 99% rule-based, and talks about ML are mostly marketing, but I don't know how reliable this rumor was).

    0: https://languagetool.org

    1: https://github.com/languagetool-org/languagetool/tree/master...

  • ruby-spellchecker

    Fast English spelling and grammar checker that can be used for autocorrection.

  • There is https://github.com/omohokcoj/ruby-spellchecker but it serves a bit different purpose - to do safe autocorrections.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • SymSpell

    A JavaScript implementation of the Symmetric Delete spelling correction algorithm. (by IceCreamYou)

  • For what I know (I've mentioned it in the first part[0]), the nspell[1] is the most close to "port (some) of Hunspell", and typo.js[2] ports even less (but might be enough for some, we used it in my previous company: it uses dictionaries for lookup, but uses its own simplistic suggest, which I needed to tweak a lot).

    SymSpell algorithm (which is quite different, I'll go into it in the next part to some extent) is much easier to port, so there is a JS SymSpell port[3] (which seems abandoned though).

    0: https://zverok.github.io/blog/2021-01-05-spellchecker-1.html

    1: https://github.com/wooorm/nspell

    2: https://github.com/cfinke/Typo.js/

    3: https://github.com/IceCreamYou/SymSpell

  • SymSpell

    SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

  • https://github.com/wolfgarbe/SymSpell lists 5 JS implementations (+ a Rust one that compiles to web assembly)

  • goSpellcheck

    A terrible spell checker in Go.

  • Great tool, it was my first contact with spellchecks. Back that I was working for a company that does translations powered by machine learning. Back then I was a student and as the article mentioned I was one of the naive ones to think that a spellcheck is an easy thing to build.

    https://github.com/victorqribeiro/goSpellcheck

    I wrote this originally in python, then I ported it to go. Back then I had plans to improve it. I believe that the most erros would be due to miss press of keys. I was sketching an algorithm to find similar words given a dictionary. Soon I had to deal with other projects (from college) and I let the spellcheck to the smart people.

  • JamSpell

    Modern spell checking library - accurate, fast, multi-language

  • That's a huge topic, which I am planning to cover towards the end of the article series please like and subscribe, but in short: yes, my opinion is that spellchecking is actually a "machine learning problem in disguise", and most of existing dictionaries are more a roundabout way of storing something-not-unlike-models than analytical data.

    But ML approach will raise a question of data availability. What good your "deep learning OSS spellchecker" will do if there aren't good (and open) models for it which cover as much languages as existing Hunspell dictionaries do? And what if adding a bunch of new words requires laborous model retraining? It is not unsolvable, but non-trivial.

    I believe all the giants have something like this inside (I don't think spelling correction in Google search bar is handled with Hunspell, right?), but it is much harder to do as an open tool, ready to embedding into other software.

    There are a notable attempts, though: JamSpell for one (https://github.com/bakwc/JamSpell), which has an open "free" models, and more precise commercial ones; source code is open (maybe also only for using "simplistic" models, haven't dug deeper).

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts