Python corpora

Open-source Python projects categorized as corpora

Python corpora Projects

  • entity-recognition-datasets

    A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

  • Project mention: Recent English newswire NER datasets? | /r/LanguageTechnology | 2023-08-27

    There is of course the list at https://github.com/juand-r/entity-recognition-datasets, but all of the recent English datasets cover other domains of English, such as the music NER, space NER, etc. All interesting things, but not 2020s English newswire.

  • Unredactor

    In this project we are tryinbg to create unredactor. Unredactor will take a redacted document and the redacted flag as input, inreturn it will give the most likely candidates to fill in redacted location. In this project we are only considered about unredacting names only. The data that we are considering is imdb data set with many review files. These files are used to buils corpora for finding tfidf score. Few files are used to train and in these files names are redacted and written into redact

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Index

Project Stars
1 entity-recognition-datasets 1,431
2 Unredactor 0

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com