entity-recognition-datasets
zshot
entity-recognition-datasets | zshot | |
---|---|---|
3 | 2 | |
1,431 | 319 | |
- | 5.0% | |
5.8 | 6.6 | |
about 1 month ago | 2 months ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
entity-recognition-datasets
-
Recent English newswire NER datasets?
There is of course the list at https://github.com/juand-r/entity-recognition-datasets, but all of the recent English datasets cover other domains of English, such as the music NER, space NER, etc. All interesting things, but not 2020s English newswire.
- Towards a Tagalog NLP pipeline: Building a spaCy model from scratch
- Any large manually annotated NER datasets?
zshot
What are some alternatives?
acl2023_conllpp
transferlearning - Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习
podium - Podium: a framework agnostic Python NLP library for data loading and preprocessing
FSL-Mate - FSL-Mate: A collection of resources for few-shot learning (FSL).
NCRFpp - NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
bllip-parser - BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.
GoLLIE - Guideline following Large Language Model for Information Extraction
DeepKE - [EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
ARElight - Granular Viewer of Sentiments Between Entities in Massively Large Documents and Collections of Texts, powered by AREkit
healthsea - Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.
Stanza - Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python