NLP-progress
SymSpell
Our great sponsors
NLP-progress | SymSpell | |
---|---|---|
17 | 16 | |
22,290 | 3,032 | |
- | - | |
3.2 | 6.0 | |
2 months ago | 17 days ago | |
Python | C# | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
NLP-progress
- [Discussion] Checklist of seminal NLP papers
- NLP research status
-
[D] How difficult/easy is to learn NLP once you have experience in a CV?
One thing is that NLP is a set of wildly different problems which share some aspects, but often use quite different techniques and assumptions about their datasets. So even if you would have NLP experience, if you'd need to start on a substantially different NLP task, you can't just apply what you know and succeed, you have to review "how things are done" for that problem domain. For a quick overview, sites like https://nlpprogress.com/ can be helpful to see what methods are used; and, perhaps even more importantly, how people are modeling the actual task.
-
Upcoming App Announcement: Lemmatize, a Foreign Language Reader
A standard step in Chinese text processing is word segmentation, which deals with this problem.
-
Is there as site tracking computer vision process?
NLP has a github project tracking NLP progress, https://github.com/sebastianruder/NLP-progress. I wanna know if there is one tracking computer vision progress.
-
[P] NLP "tl;dr" Notes on Transformers
It would also be cool to have some charts with parameter density and even overall effectiveness (a tl;dr version of SOTA-trackers, maybe?) if that doesn't prove too infeasible.
- What are state-of-the-art methods for abstractive text summarization ?
-
BreadPanes 81: "They/Them"
As I said It increase ambiguity and cognitive overheard, needlessly given that "it" exists. Moreover it also make it harder for artificial intelligence to understand human text https://github.com/sebastianruder/NLP-progress/blob/master/english/coreference_resolution.md
-
[Request] Curated Advanced NLP Resources
I could not find it on the internet (including on GitHub, Kaggle, Medium, or Reddit.) And, I know about NLP Progress and The Super Duper NLP Repo.
-
How do you guys find/ keep up to date with the latest NLP papers?
For someone who needs to be on top of the latest research - Twitter (distraction-prone, marketing-friendly, instantly-gratifying, quick), newsletters in ML + NLP (https://jack-clark.net/, ruder.io, offconvex.org, etc.) (distraction-free, generic, time-consuming), SOTA chasing (https://paperswithcode.com/, http://nlpprogress.com/) (distraction-free, generic + focused, code-friendly)
SymSpell
-
Should you combine edit distance "spell check" algorithms with phonetic matching algorithms for robust keyword finding?
The SimSpell algorithm uses deletions to determine edit distance of the input query word compared to a dictionary of correctly spelled words. The Double Metaphone algorithm (or other phonetic algorithms) convert the words to phonetic versions (phonetic "hashes" basically), and you then search based on the input phonetic hash matching the dictionary of phonetic hashes.
- Show HN: I automated 1/2 of my typing
-
Learn more about spell checkers
Books: a. "Speech and Language Processing" by Daniel Jurafsky and James H. Martin (3rd Edition) - This book covers various aspects of natural language processing, including a section on spelling correction that provides a comprehensive introduction to the topic. b. "Foundations of Statistical Natural Language Processing" by Christopher D. Manning and Hinrich Schütze - This book provides an overview of statistical approaches in NLP, including a chapter on spelling correction. Articles: a. "How to Write a Spelling Corrector" by Peter Norvig - This article demonstrates the development of a simple spelling corrector using statistical algorithms. It's a great starting point for understanding the basics of spell checkers. (Link: https://norvig.com/spell-correct.html) b. "The Design of a Proofreading Software Service" by Michael D. Garris and James L. Blue - This article presents the design and implementation of a spelling correction system that can be integrated into various applications. (Link: https://www.nist.gov/system/files/documents/itl/iad/89403123.pdf) c. "A Fast and Flexible Spellchecker" by Atkinson, K. (2006) - This article details the design of a spell checker that uses a combination of rule-based and statistical approaches for improved performance. (Link: https://aspell.net/0.60.6.1/aspell-0.60.6.1.pdf) Online Resources: a. The Natural Language Toolkit (NLTK) - This is a popular Python library for natural language processing. It includes a spell checker module and various examples of how to use it. (Link: https://www.nltk.org/) b. SymSpell - This is an open-source spell checking library that uses a Symmetric Delete spelling correction algorithm for high performance and accuracy. The GitHub repository includes a detailed description of the algorithm and examples of how to use it. (Link: https://github.com/wolfgarbe/SymSpell) These resources should provide a solid foundation for understanding the design, algorithms, and usage of spell checkers. Happy learning!
-
Turn the spellchecker into autocorrection software
Can this github.com/wolfgarbe/SymSpell or this github.com/ruby/did_you_mean or any of these github.com/topics/spell-check?o=desc&s=forks spellcheckers be used as an autocorrection software?
-
Help with deep learning project "autocorrection"
Do you absolutely need to use deep learning? There are tons of way faster autocorrect implementations that use levenshtein distances and non-DL techniques such as SymSpell or Norvig’s algorithm. DL is both expensive and requires tons of data to train on, I would stay away from that unless you’re doing it for your own enrichment or a school project.
-
Spellcheck and Levenshtein distance
This library claims to be orders of magnitude faster: https://github.com/wolfgarbe/SymSpell
-
Auto correct/Auto complete feature
If you want to do both at the same time (prefix search, allowing for misspellings), you can use a trie, but rather than just putting all your words in it, you can put everything in the "deletion neighborhood" of each word (that is, each possible variant of each word that has one character deleted), in an approach sort of like what's described here. Fair warning, though, that this gets a little hairy, and you'll have to decide how to weight prefix matches vs. misspellings in your rankings.
- SymSpell: 1M times faster spelling correction
-
Hacker News top posts: Mar 6, 2022
SymSpell: 1M times faster spelling correction\ (6 comments)
What are some alternatives?
nlp_tasks - Natural Language Processing Tasks and References
JamSpell - Modern spell checking library - accurate, fast, multi-language
wtpsplit - Code for Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation
hunspell - The most popular spellchecking library.
awesome-hungarian-nlp - A curated list of NLP resources for Hungarian
nlprule - A fast, low-resource Natural Language Processing and Text Correction library written in Rust.
languagetool - Style and Grammar Checker for 25+ Languages
OPUS-MT-train - Training open neural machine translation models
SymSpell - A JavaScript implementation of the Symmetric Delete spelling correction algorithm.
tldr-transformers - The "tl;dr" on a few notable transformer papers (pre-2022).
ruby-spellchecker - Fast English spelling and grammar checker that can be used for autocorrection.