Korpora
Korean corpus repository (by ko-nlp)
corpora
A collection of small corpuses of interesting data for the creation of bots and similar stuff. (by dariusk)
Korpora | corpora | |
---|---|---|
1 | 7 | |
645 | 4,859 | |
0.0% | - | |
0.0 | 5.5 | |
over 1 year ago | 3 months ago | |
Python | JavaScript | |
Creative Commons Attribution 4.0 | - |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Korpora
Posts with mentions or reviews of Korpora.
We have used some of these posts to build our list of alternatives
and similar projects.
-
Resources About Cross-Linguistic Relative Phoneme Frequency
- LDA (usually very expensive, but some options exist and in some cases you can google them to find them elsewhere for free): https://www.ldc.upenn.edu/ - Connecting with a university or looking at a linguistics lab's corpus holdings (some will host -- or freely acquired the corpus and therefore you can find it on the internet) - Some language-specific lists or collections: e.g. https://warwick.ac.uk/fac/soc/al/repository/staff/harrisontilly/corpora-for-workshop/, https://github.com/ko-nlp/Korpora , https://guides.uflib.ufl.edu/frenchlinguistics/corpora - Some larger overviews, which may contain links: e.g. https://www.clarin.eu/resource-families/corpora-academic-texts , https://libguides.reed.edu/linguistics/datasets-corpora - Some larger projects to create (often text-based) corpora for multiple languages (often for NLP): e.g. https://www.sketchengine.eu/documentation/tenten-corpora/
corpora
Posts with mentions or reviews of corpora.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-04-22.
- Corpora: A collection of small corpuses of interesting data
-
How can you database hundreds or thousands of items for a trading game like Pirates. Only needed info: name,$$$ and a general type inferred by the list. I'm thinking .csv
Check if this or some other word lists in the dataset is useful https://github.com/dariusk/corpora/blob/master/data/objects/objects.json
-
Obtaining a Word List
This might work. https://github.com/dariusk/corpora/blob/master/data/words/word_clues/clues_five.json
-
Procedural Text Generation?
corpora
- A lot of adjectives, but not necessarily every adjective in the English language
-
Part 1: How to Build a Serverless Twitter Bot
I picked one from Darius because he also keeps a GitHub repository of a lot of corpora that a ton of bot makers pull from. You can find at https://github.com/dariusk/corpora.
-
Finding lists of words and other resources for text generation
Check out Corpora. Lots of lists in various categories, and you can't get friendlier than the CC0 license!
What are some alternatives?
When comparing Korpora and corpora you can also consider the following projects:
korean-word-ipa-dictionary - Dictionary of pairs of Korean word and IPA crawled from Wiktionary (Korean edition)
atto - The new BASIC computer that runs in your browser!