neuspell
google-books-ngram-frequency
neuspell | google-books-ngram-frequency | |
---|---|---|
1 | 11 | |
642 | 29 | |
- | - | |
0.0 | 2.1 | |
10 months ago | 9 months ago | |
Python | Python | |
MIT License | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
neuspell
-
What are good python libraries for spellcheck ?
I have experience with neuspell (https://github.com/neuspell/neuspell). It works at the sentence level as it olso has a language model component for context. There are several models already trained for English you can choose from, and you can train your own for other languages or for the multilingual case. We were working with queries (so very short "sentences") but the results where quite good when compared to other methods. It was supposed to be sota one year ago or so.
google-books-ngram-frequency
-
The returns to learning the most common words, by language [OC]
Nice! Yes I created the graph. Everything is in this Github repository including the underlying word lists and Python code to create them and the graph. A creative common license applies. You might also be interested another Github repository where I released lists of the most common words and sentences in 62 languages based on subtitle data!
Yes, the data comes from the same books. For each language I create an ordered list of the most frequent words, looking like this. The graph then just plots the rank of the word on the x-axis and the cumulative frequency (column "cumshare" in the csv files) on the y-axis.
-
New lists of the most common words, ngrams, and sentences based on Google Books (8 languages) and OpenSubtitles (62 languages)
orgtre/google-books-ngram-frequency
What are some alternatives?
PyTorch-NLP - Basic Utilities for PyTorch Natural Language Processing (NLP)
github-orgmode-tests - This is a test project where you can explore how github interprets Org-mode files
JamSpell - Modern spell checking library - accurate, fast, multi-language
anki-editor - Emacs minor mode for making Anki cards with Org Mode
DataProfiler - What's in your data? Extract schema, statistics and entities from datasets
anki - Anki's shared backend and web components, and the Qt frontend
beir - A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
top-open-subtitles-sentences - Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code
chatgpt-comparison-detection - Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥
dirsearch - Web path scanner