corpora Open-Source Projects

entity-recognition-datasets

3 1,449 5.8 Python

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

Project mention: Recent English newswire NER datasets? | /r/LanguageTechnology | 2023-08-27

There is of course the list at https://github.com/juand-r/entity-recognition-datasets, but all of the recent English datasets cover other domains of English, such as the music NER, space NER, etc. All interesting things, but not 2020s English newswire.

Unredactor

1 0 10.0 Python

In this project we are tryinbg to create unredactor. Unredactor will take a redacted document and the redacted flag as input, inreturn it will give the most likely candidates to fill in redacted location. In this project we are only considered about unredacting names only. The data that we are considering is imdb data set with many review files. These files are used to buils corpora for finding tfidf score. Few files are used to train and in these files names are redacted and written into redact
Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Index

	Project	Stars
1	entity-recognition-datasets	1,449
2	Unredactor	0

corpora

corpora Open-Source Projects

entity-recognition-datasets

Unredactor

Scout Monitoring

Index