tldr-transformers
lemmatization-lists
Our great sponsors
tldr-transformers | lemmatization-lists | |
---|---|---|
4 | 3 | |
167 | 303 | |
- | - | |
0.0 | 0.0 | |
over 1 year ago | about 2 years ago | |
MIT License | ODC Open Database License v1.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tldr-transformers
- Show HN: The “tl;dr” of Recent Transformer Papers
-
Show HN: Tl;Dr” on Transformers Papers
With the explosion in research on all things transformers, it seemed there was a need to have a single table to distill the "tl;dr" of each paper's contributions relative to each other. Here is what I got so far: https://github.com/will-thompson-k/tldr-transformers . Would love feedback - and feel free to contribute too :)
-
[P] NLP "tl;dr" Notes on Transformers
In any case, I'm liking the first glance so far. I'd just transpose the summary tables so they wouldn't get so tightly squeezed: https://github.com/will-thompson-k/tldr-transformers/blob/main/notes/bart.md
With the explosion in work on all things transformers, I felt the need to keep a single table of the "tl;dr" of various papers to distill their main takeaways: https://github.com/will-thompson-k/tldr-transformers . Would love feedback!
lemmatization-lists
-
Ambiguous spellings
It's a bit of a massive undertaking maintaining such a data set so it's mostly taken from https://github.com/michmech/lemmatization-lists At the top of the file you'll see some additional I've added to deal with personal pronouns and numbers.
-
Is there a text list of words and their variations?
Another one to add to your list: https://github.com/michmech/lemmatization-lists
-
Trying to build a lemmatizer from scratch
One approach might be to take a lemmatization list, like the lemma-token lists at https://github.com/michmech/lemmatization-lists/, and compile it into a Finite State Transducer. The Helsinki FST package, for instance, has an hfst-strings2fst command to compile pairs of strings into a transducer. You might need to do some reformatting of the input first.
What are some alternatives?
NLP-progress - Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
trankit - Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
FARM - :house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
awesome-sentiment-analysis - Repository with all what is necessary for sentiment analysis and related areas
azure-sql-db-openai - Samples on how to use Azure SQL database with Azure OpenAI
thesaurus - Offline database of synonyms/thesaurus
long-range-arena - Long Range Arena for Benchmarking Efficient Transformers
Awesome-pytorch-list - A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
transformers-convert
language-planner - Official Code for "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents"
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
awesome-instruction-dataset - A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)