corpora
Korpora
corpora | Korpora | |
---|---|---|
7 | 1 | |
4,851 | 645 | |
- | 0.0% | |
5.5 | 0.0 | |
3 months ago | over 1 year ago | |
JavaScript | Python | |
- | Creative Commons Attribution 4.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
corpora
- Corpora: A collection of small corpuses of interesting data
-
How can you database hundreds or thousands of items for a trading game like Pirates. Only needed info: name,$$$ and a general type inferred by the list. I'm thinking .csv
Check if this or some other word lists in the dataset is useful https://github.com/dariusk/corpora/blob/master/data/objects/objects.json
-
Obtaining a Word List
This might work. https://github.com/dariusk/corpora/blob/master/data/words/word_clues/clues_five.json
-
Procedural Text Generation?
corpora
- A lot of adjectives, but not necessarily every adjective in the English language
-
Part 1: How to Build a Serverless Twitter Bot
I picked one from Darius because he also keeps a GitHub repository of a lot of corpora that a ton of bot makers pull from. You can find at https://github.com/dariusk/corpora.
-
Finding lists of words and other resources for text generation
Check out Corpora. Lots of lists in various categories, and you can't get friendlier than the CC0 license!
Korpora
-
Resources About Cross-Linguistic Relative Phoneme Frequency
- LDA (usually very expensive, but some options exist and in some cases you can google them to find them elsewhere for free): https://www.ldc.upenn.edu/ - Connecting with a university or looking at a linguistics lab's corpus holdings (some will host -- or freely acquired the corpus and therefore you can find it on the internet) - Some language-specific lists or collections: e.g. https://warwick.ac.uk/fac/soc/al/repository/staff/harrisontilly/corpora-for-workshop/, https://github.com/ko-nlp/Korpora , https://guides.uflib.ufl.edu/frenchlinguistics/corpora - Some larger overviews, which may contain links: e.g. https://www.clarin.eu/resource-families/corpora-academic-texts , https://libguides.reed.edu/linguistics/datasets-corpora - Some larger projects to create (often text-based) corpora for multiple languages (often for NLP): e.g. https://www.sketchengine.eu/documentation/tenten-corpora/
What are some alternatives?
atto - The new BASIC computer that runs in your browser!
korean-word-ipa-dictionary - Dictionary of pairs of Korean word and IPA crawled from Wiktionary (Korean edition)
Eigengrau-s-Essential-Establishment-Generator - A town generator that is suitable for out of the box play in any fantasy TTRPG setting.
gum - Repository for the Georgetown University Multilayer Corpus (GUM)
tf2-botcheck - App that interacts with TF2 to detect known named bots and name-stealing bots in Casual.
trafilatura - Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
empirist-corpus - A web and social media corpus based on the dataset of the EmpiriST 2015 shared task
japanese-words-to-vectors - Word2vec (word to vectors) approach for Japanese language using Gensim and Mecab.
pluralize - Pluralize or singularize any word based on a count
open-discourse - Open Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).
KahootBot - A generator for Kahoot bots
bookcorpus - Crawl BookCorpus