Rebuilding the spellchecker, pt.3: Lookup–compounds and solutions

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

languagetool

310 11,543 10.0 Java

Style and Grammar Checker for 25+ Languages

I believe that LanguageTool[0] is the closest open-source counterpart to Grammarly. Though, in my experience, it is not a half as useful... But multilinugal and open-source.
I have a distant dream of doing to it what I did to Hunspell (write a code/series of articles explaining how it works and why it is so hard), but we'll see.
For what I know, LanguageTool is based just on a huge set of rules (you can see them in the repo[1]); and Grammarly is a mix of rule-based and machine-learning suggestions (I heard a rumor that it is 99% rule-based, and talks about ML are mostly marketing, but I don't know how reliable this rumor was).
0: https://languagetool.org
1: https://github.com/languagetool-org/languagetool/tree/master...

ruby-spellchecker

2 10 0.0 Ruby

Fast English spelling and grammar checker that can be used for autocorrection.

There is https://github.com/omohokcoj/ruby-spellchecker but it serves a bit different purpose - to do safe autocorrections.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
SymSpell

1 9 0.0 JavaScript

A JavaScript implementation of the Symmetric Delete spelling correction algorithm. (by IceCreamYou)

For what I know (I've mentioned it in the first part[0]), the nspell[1] is the most close to "port (some) of Hunspell", and typo.js[2] ports even less (but might be enough for some, we used it in my previous company: it uses dictionaries for lookup, but uses its own simplistic suggest, which I needed to tweak a lot).
SymSpell algorithm (which is quite different, I'll go into it in the next part to some extent) is much easier to port, so there is a JS SymSpell port[3] (which seems abandoned though).
0: https://zverok.github.io/blog/2021-01-05-spellchecker-1.html
1: https://github.com/wooorm/nspell
2: https://github.com/cfinke/Typo.js/
3: https://github.com/IceCreamYou/SymSpell

SymSpell

16 3,034 6.0 C#

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

https://github.com/wolfgarbe/SymSpell lists 5 JS implementations (+ a Rust one that compiles to web assembly)

goSpellcheck

1 1 0.0 Go

A terrible spell checker in Go.

Great tool, it was my first contact with spellchecks. Back that I was working for a company that does translations powered by machine learning. Back then I was a student and as the article mentioned I was one of the naive ones to think that a spellcheck is an easy thing to build.
https://github.com/victorqribeiro/goSpellcheck
I wrote this originally in python, then I ported it to go. Back then I had plans to improve it. I believe that the most erros would be due to miss press of keys. I was sketching an algorithm to find similar words given a dictionary. Soon I had to deal with other projects (from college) and I let the spellcheck to the smart people.

JamSpell

3 591 2.4 C++

Modern spell checking library - accurate, fast, multi-language

That's a huge topic, which I am planning to cover towards the end of the article series please like and subscribe, but in short: yes, my opinion is that spellchecking is actually a "machine learning problem in disguise", and most of existing dictionaries are more a roundabout way of storing something-not-unlike-models than analytical data.
But ML approach will raise a question of data availability. What good your "deep learning OSS spellchecker" will do if there aren't good (and open) models for it which cover as much languages as existing Hunspell dictionaries do? And what if adding a bunch of new words requires laborous model retraining? It is not unsolvable, but non-trivial.
I believe all the giants have something like this inside (I don't think spelling correction in Google search bar is handled with Hunspell, right?), but it is much harder to do as an open tool, ready to embedding into other software.
There are a notable attempts, though: JamSpell for one (https://github.com/bakwc/JamSpell), which has an open "free" models, and more precise commercial ones; source code is open (maybe also only for using "simplistic" models, haven't dug deeper).

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Should you combine edit distance "spell check" algorithms with phonetic matching algorithms for robust keyword finding?
1 project | /r/AskComputerScience | 7 Nov 2023
Help with deep learning project "autocorrection"
1 project | /r/deeplearning | 15 Jan 2023
Spellcheck and Levenshtein distance
1 project | /r/MLQuestions | 15 Nov 2022
Auto correct/Auto complete feature
1 project | /r/AskComputerScience | 27 Jun 2022
SymSpell: 1M times faster spelling correction
1 project | /r/hackernews | 6 Mar 2022

Rebuilding the spellchecker, pt.3: Lookup–compounds and solutions

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Spellcheck spelling-correction Levenshtein Grammar fuzzy-search
Post date: 15 Jan 2021

languagetool

ruby-spellchecker

WorkOS

SymSpell

SymSpell

goSpellcheck

JamSpell

Related posts

Rebuilding the spellchecker, pt.3: Lookup–compounds and solutions

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Spellcheck spelling-correction Levenshtein Grammar fuzzy-search Post date: 15 Jan 2021

languagetool

ruby-spellchecker

WorkOS

SymSpell

SymSpell

goSpellcheck

JamSpell

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Spellcheck spelling-correction Levenshtein Grammar fuzzy-search
Post date: 15 Jan 2021