corpora Open-Source Projects
-
entity-recognition-datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
-
Unredactor
In this project we are tryinbg to create unredactor. Unredactor will take a redacted document and the redacted flag as input, inreturn it will give the most likely candidates to fill in redacted location. In this project we are only considered about unredacting names only. The data that we are considering is imdb data set with many review files. These files are used to buils corpora for finding tfidf score. Few files are used to train and in these files names are redacted and written into redact
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
There is of course the list at https://github.com/juand-r/entity-recognition-datasets, but all of the recent English datasets cover other domains of English, such as the music NER, space NER, etc. All interesting things, but not 2020s English newswire.
Index
Project | Stars | |
---|---|---|
1 | entity-recognition-datasets | 1,449 |
2 | Unredactor | 0 |
Sponsored