neuspell
JamSpell
neuspell | JamSpell | |
---|---|---|
1 | 3 | |
642 | 592 | |
- | - | |
0.0 | 2.4 | |
10 months ago | 7 months ago | |
Python | C++ | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
neuspell
-
What are good python libraries for spellcheck ?
I have experience with neuspell (https://github.com/neuspell/neuspell). It works at the sentence level as it olso has a language model component for context. There are several models already trained for English you can choose from, and you can train your own for other languages or for the multilingual case. We were working with queries (so very short "sentences") but the results where quite good when compared to other methods. It was supposed to be sota one year ago or so.
JamSpell
-
Rebuilding the spellchecker, pt.4: Introduction to suggest algorithm
There is, for example, a curious evaluation table provided by a modern ML-based spellchecker JamSpell. According to it, JamSpell is awesome—while Hunspell is a mere 0.03% better than dummy ("fix nothing") spellchecker... Which doesn't ring true, somehow!
-
Rebuilding the spellchecker, pt.3: Lookup–compounds and solutions
That's a huge topic, which I am planning to cover towards the end of the article series please like and subscribe, but in short: yes, my opinion is that spellchecking is actually a "machine learning problem in disguise", and most of existing dictionaries are more a roundabout way of storing something-not-unlike-models than analytical data.
But ML approach will raise a question of data availability. What good your "deep learning OSS spellchecker" will do if there aren't good (and open) models for it which cover as much languages as existing Hunspell dictionaries do? And what if adding a bunch of new words requires laborous model retraining? It is not unsolvable, but non-trivial.
I believe all the giants have something like this inside (I don't think spelling correction in Google search bar is handled with Hunspell, right?), but it is much harder to do as an open tool, ready to embedding into other software.
There are a notable attempts, though: JamSpell for one (https://github.com/bakwc/JamSpell), which has an open "free" models, and more precise commercial ones; source code is open (maybe also only for using "simplistic" models, haven't dug deeper).
-
Rebuilding the most popular spellchecker. Part 1
Obviously, there are open-source spellcheckers other than Hunspell. GNU aspell (that at one point was superseded by Hunspell, but still holds its ground in English suggestion quality), to name one of the older ones; but also there are novel approaches, like SymSpell, claiming to be "1 million times faster" or ML-based JamSpell, claiming to be much more accurate.
What are some alternatives?
google-books-ngram-frequency - Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code
SymSpell - SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
PyTorch-NLP - Basic Utilities for PyTorch Natural Language Processing (NLP)
SymSpell - A JavaScript implementation of the Symmetric Delete spelling correction algorithm.
DataProfiler - What's in your data? Extract schema, statistics and entities from datasets
hunspell - The most popular spellchecking library.
beir - A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
WeCantSpell.Hunspell - A port of Hunspell v1 for .NET and .NET Standard
chatgpt-comparison-detection - Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥
ruby-spellchecker - Fast English spelling and grammar checker that can be used for autocorrection.
goSpellcheck - A terrible spell checker in Go.
spylls - Pure Python spell-checker, (almost) full port of Hunspell