tatoeba2
FrequencyWords
Our great sponsors
tatoeba2 | FrequencyWords | |
---|---|---|
47 | 16 | |
667 | 1,064 | |
2.5% | - | |
0.0 | 0.0 | |
6 days ago | about 2 years ago | |
PHP | C# | |
GNU Affero General Public License v3.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tatoeba2
-
The AI Revolution Is Crushing Thousands of Languages
Alternate take, it can also help people learn niche languages if native speakers contribute to data sets. For example, I've been using Clozemaster for the past few months as a way to work on vocabulary on some languages, and they pull their dataset from Tatoeba [1]. I was very surprised to see that my father's native language, Kabylie, which is admittedly a somewhat niche language, is one of the top languages by sentence contribution in the dataset (over 700k entries, more than French or Spanish or German). I showed him the sentences once and he confirmed that yes, they all seem like what a native speaker would say. Not all of them have translations into other languages of course, and a lot of them are slight variations on each other, but some native speakers are there contributing. It's not currently an option to use in Clozemaster -- I'm guessing the TTS isn't really there -- but I totally could see these as gaps that are easily filled.
Same with my wife's native language (Bengali). There are surprisingly few language learning resources for Bangla, even though it's the 7th most spoken language in the world. But there it is in the data set with TTS and the ability for Clozemaster to have ChatGPT "explain" what's going on in the sentence (a very useful feature for new speakers).
Anyway, I don't view AI as good or bad, just another tool that we should be intentional about when we cultivate the data sets underlying the tool.
[1] https://tatoeba.org
-
Where can I find reliable example sentences?
Maybe on tatoeba.org with filters
-
Best vocab (not writing) app
I use both. I make a lot of my own cards so I get to focus on the vocab I want. Generally find a word I want to learn, use https://forvo.com/ to find native audio for it, then use https://tatoeba.org/ to find sentences use that word. Once you get a bit of practise it's pretty quick to make a word note, then make 2 or 3 sentence notes for it*. However I do use some pre-made decks like this set of sentence decks for each HSK level with native audio: https://ankiweb.net/shared/byauthor/933449107
-
Anyone else spend heaps of time searching for sentences for Anki?
You can try tatoeba https://tatoeba.org but I don't know if it's good with arabic ...
-
GPT-4's toki pona capabilities
tatoeba if anything because that has sentences so at least a modicum of context
-
Is there an app or website where I can paste a word/phrase and get examples of how it’s used in a sentence?
I use Tatoeba https://tatoeba.org : it's a collection of phrases with sometimes translations and audio recordings. You could use Forvo but it's only audio recordings.
- maneiras de falar "no pasa nada / it's okay/all right" em BR-PT?
- How do I get audio data from from native speakers for Anki?
-
anyone know a site like Reverso but for simpler sentences?
As someone else suggested, Tatoeba is also a good option. Nowadays, I use it less and less because I prefer the more didactic sentences found on online dictionaries. Nonetheless, it's still very good, especially due to the sheer quantity of sentences you can find there.
-
Nihongo Lessons has launched on the App Store
Appearances in the Tatoeba example sentence database.
FrequencyWords
-
Are the persistance storage encryption phrases that show under "example" part of the persistance windows random?
Looking at my notes ... when I created my word list, I started by finding a list of the most commonly used words in English, with information about how common each word was. The page where I found it has moved to Github. I don't know if he's produced a newer list, but the en.txt file I started from has a little over 450K words in it. A "6d6" list would have 46656 words, so it would definitely be possible to make one.
-
AutoClozemaster-Anki - A tool to autogenerate clozemaster style anki decks
It makes use of these frequency lists (have to be manually downloaded). It will make a request to the tatoeba api to grab a sentence (or multiple) including this word and if you would like it will also request an audio file from google translate (while the code is running a temporary recordings folder is created). It then packages it up into an anki deck using the genanki python module.
- Vocabulary library (15k most used words)
-
Was wondering if there was a "top 1500 most used words" go to site hat most people like to go to/use for all language?
This is the frequency list I always use: https://github.com/hermitdave/FrequencyWords
- Most common words in various languages?
-
A quick overview of my language learning method
Yeah, if the language doesn't have a preexisting list, you might need to generate one yourself from some large corpus of documents (e.g., Wikipedia using this tool). Btw it looks like a small one is available here for Armenian.
-
Someone, please convince me to drop SRS, Anki (whatever)
Literally none of the Spanish words you mentioned appear in the top 100K of this frequency list:
-
Frequency lists / frequency dictionaries / Monolingual dictionaries for Modern Hebrew?
There's a large frequency list generated from subtitle data [here](https://github.com/hermitdave/FrequencyWords/tree/master/content/2018/he) and a shorter list [here](https://www.teachmehebrew.com/hebrew-frequency-list.html)
- FrequencyWords: Repository for Frequency Word List Generator and processed files
-
I created a tool to generate decks of i+1 sentences
There is a pretty solid frequency list here generated from OpenSubtitles. There are lots of other languages in the same repository.
What are some alternatives?
gutensearch - Search engine for Project Gutenberg books
wikipedia-word-frequency - Gather modern English word frequencies from all enwiki articles.
river-runner - Uses USGS/MERIT Basin data to visualize the path of a rain droplet to its endpoint.
genanki - A Python 3 library for generating Anki decks
rum - Simple, decomplected, isomorphic HTML UI library for Clojure and ClojureScript