lemmatization-lists
CogCompNLP
lemmatization-lists | CogCompNLP | |
---|---|---|
3 | - | |
303 | 469 | |
- | -0.2% | |
0.0 | 0.0 | |
over 2 years ago | 11 months ago | |
Java | ||
ODC Open Database License v1.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lemmatization-lists
-
Ambiguous spellings
It's a bit of a massive undertaking maintaining such a data set so it's mostly taken from https://github.com/michmech/lemmatization-lists At the top of the file you'll see some additional I've added to deal with personal pronouns and numbers.
-
Is there a text list of words and their variations?
Another one to add to your list: https://github.com/michmech/lemmatization-lists
-
Trying to build a lemmatizer from scratch
One approach might be to take a lemmatization list, like the lemma-token lists at https://github.com/michmech/lemmatization-lists/, and compile it into a Finite State Transducer. The Helsinki FST package, for instance, has an hfst-strings2fst command to compile pairs of strings into a transducer. You might need to do some reformatting of the input first.
CogCompNLP
We haven't tracked posts mentioning CogCompNLP yet.
Tracking mentions began in Dec 2020.
What are some alternatives?
trankit - Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
CoreNLP - CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
tldr-transformers - The "tl;dr" on a few notable transformer papers (pre-2022).
Apache OpenNLP - Apache OpenNLP
awesome-sentiment-analysis - Repository with all what is necessary for sentiment analysis and related areas
DKPro Core - Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
thesaurus - Offline database of synonyms/thesaurus
Mallet - MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
Awesome-pytorch-list - A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
simplenlg - Java API for Natural Language Generation. Originally developed by Ehud Reiter at the University of Aberdeen’s Department of Computing Science and co-founder of Arria NLG. This git repo is the official SimpleNLG version.
OpenNRE - An Open-Source Package for Neural Relation Extraction (NRE)