FrequencyWords
Repository for Frequency Word List Generator and processed files (by hermitdave)
wikipedia-word-frequency
Gather modern English word frequencies from all enwiki articles. (by IlyaSemenov)
FrequencyWords | wikipedia-word-frequency | |
---|---|---|
16 | 5 | |
1,064 | 186 | |
- | - | |
0.0 | 4.8 | |
about 2 years ago | about 2 months ago | |
C# | Python | |
MIT License | MIT License |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
FrequencyWords
Posts with mentions or reviews of FrequencyWords.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-03-10.
-
Are the persistance storage encryption phrases that show under "example" part of the persistance windows random?
Looking at my notes ... when I created my word list, I started by finding a list of the most commonly used words in English, with information about how common each word was. The page where I found it has moved to Github. I don't know if he's produced a newer list, but the en.txt file I started from has a little over 450K words in it. A "6d6" list would have 46656 words, so it would definitely be possible to make one.
-
AutoClozemaster-Anki - A tool to autogenerate clozemaster style anki decks
It makes use of these frequency lists (have to be manually downloaded). It will make a request to the tatoeba api to grab a sentence (or multiple) including this word and if you would like it will also request an audio file from google translate (while the code is running a temporary recordings folder is created). It then packages it up into an anki deck using the genanki python module.
- Vocabulary library (15k most used words)
-
Was wondering if there was a "top 1500 most used words" go to site hat most people like to go to/use for all language?
This is the frequency list I always use: https://github.com/hermitdave/FrequencyWords
- Most common words in various languages?
-
A quick overview of my language learning method
Yeah, if the language doesn't have a preexisting list, you might need to generate one yourself from some large corpus of documents (e.g., Wikipedia using this tool). Btw it looks like a small one is available here for Armenian.
-
Someone, please convince me to drop SRS, Anki (whatever)
Literally none of the Spanish words you mentioned appear in the top 100K of this frequency list:
-
Frequency lists / frequency dictionaries / Monolingual dictionaries for Modern Hebrew?
There's a large frequency list generated from subtitle data [here](https://github.com/hermitdave/FrequencyWords/tree/master/content/2018/he) and a shorter list [here](https://www.teachmehebrew.com/hebrew-frequency-list.html)
- FrequencyWords: Repository for Frequency Word List Generator and processed files
-
I created a tool to generate decks of i+1 sentences
There is a pretty solid frequency list here generated from OpenSubtitles. There are lots of other languages in the same repository.
wikipedia-word-frequency
Posts with mentions or reviews of wikipedia-word-frequency.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-04-07.
-
Re-license / upgrade from CC BY-SA 3.0 to CC BY-SA 4.0?
I have a project that uses words from Wikipedia (specifically, a list of frequently-used words on Wikipedia).
-
Where can I find the list of Ukrainian words ordered by frequency of use?
This might help, generated based on wikipedia data, including Ukrainian https://github.com/IlyaSemenov/wikipedia-word-frequency/tree/master/results
-
Which is more secure, a 64 character password made up of letters both uppercase and lowercase, numbers, and symbols or a passphrase of multiple words with spaces between the words totalling up to 64 characters?
For this analysis, I'm going to use the word frequency of Wikipedia from 2021-08-20. I'm choosing this, because it has 2,676,542 unique words. Most of those are non-words, but regardless, they're still unique, and would work for passphrases.
-
A quick overview of my language learning method
Yeah, if the language doesn't have a preexisting list, you might need to generate one yourself from some large corpus of documents (e.g., Wikipedia using this tool). Btw it looks like a small one is available here for Armenian.
-
How to get more RAM in Colab without upgrading to Pro
Have you considered trying to modify the code to be more memory efficient instead? You're just performing a word count here, you don't need to have all of your data in memory at the same time to achieve that. Hell, you could even use a pre-computed word frequency if you can find one that's suitable for your task. For example, here's one computed against english wikipedia: https://github.com/IlyaSemenov/wikipedia-word-frequency
What are some alternatives?
When comparing FrequencyWords and wikipedia-word-frequency you can also consider the following projects:
tatoeba2 - Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
hierarchical-attention-networks - TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"
genanki - A Python 3 library for generating Anki decks
orchard-street-wordlists - Wordlists for generating passphrases