Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
SymSpell
A JavaScript implementation of the Symmetric Delete spelling correction algorithm. (by IceCreamYou)
-
SymSpell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
I believe that LanguageTool[0] is the closest open-source counterpart to Grammarly. Though, in my experience, it is not a half as useful... But multilinugal and open-source.
I have a distant dream of doing to it what I did to Hunspell (write a code/series of articles explaining how it works and why it is so hard), but we'll see.
For what I know, LanguageTool is based just on a huge set of rules (you can see them in the repo[1]); and Grammarly is a mix of rule-based and machine-learning suggestions (I heard a rumor that it is 99% rule-based, and talks about ML are mostly marketing, but I don't know how reliable this rumor was).
0: https://languagetool.org
1: https://github.com/languagetool-org/languagetool/tree/master...
There is https://github.com/omohokcoj/ruby-spellchecker but it serves a bit different purpose - to do safe autocorrections.
For what I know (I've mentioned it in the first part[0]), the nspell[1] is the most close to "port (some) of Hunspell", and typo.js[2] ports even less (but might be enough for some, we used it in my previous company: it uses dictionaries for lookup, but uses its own simplistic suggest, which I needed to tweak a lot).
SymSpell algorithm (which is quite different, I'll go into it in the next part to some extent) is much easier to port, so there is a JS SymSpell port[3] (which seems abandoned though).
0: https://zverok.github.io/blog/2021-01-05-spellchecker-1.html
1: https://github.com/wooorm/nspell
2: https://github.com/cfinke/Typo.js/
3: https://github.com/IceCreamYou/SymSpell
https://github.com/wolfgarbe/SymSpell lists 5 JS implementations (+ a Rust one that compiles to web assembly)
Great tool, it was my first contact with spellchecks. Back that I was working for a company that does translations powered by machine learning. Back then I was a student and as the article mentioned I was one of the naive ones to think that a spellcheck is an easy thing to build.
https://github.com/victorqribeiro/goSpellcheck
I wrote this originally in python, then I ported it to go. Back then I had plans to improve it. I believe that the most erros would be due to miss press of keys. I was sketching an algorithm to find similar words given a dictionary. Soon I had to deal with other projects (from college) and I let the spellcheck to the smart people.
That's a huge topic, which I am planning to cover towards the end of the article series please like and subscribe, but in short: yes, my opinion is that spellchecking is actually a "machine learning problem in disguise", and most of existing dictionaries are more a roundabout way of storing something-not-unlike-models than analytical data.
But ML approach will raise a question of data availability. What good your "deep learning OSS spellchecker" will do if there aren't good (and open) models for it which cover as much languages as existing Hunspell dictionaries do? And what if adding a bunch of new words requires laborous model retraining? It is not unsolvable, but non-trivial.
I believe all the giants have something like this inside (I don't think spelling correction in Google search bar is handled with Hunspell, right?), but it is much harder to do as an open tool, ready to embedding into other software.
There are a notable attempts, though: JamSpell for one (https://github.com/bakwc/JamSpell), which has an open "free" models, and more precise commercial ones; source code is open (maybe also only for using "simplistic" models, haven't dug deeper).