PLOD-AbbreviationDetection
adaptnlp
PLOD-AbbreviationDetection | adaptnlp | |
---|---|---|
1 | 2 | |
9 | 414 | |
- | 0.0% | |
0.0 | 0.0 | |
over 1 year ago | over 2 years ago | |
Jupyter Notebook | Jupyter Notebook | |
Creative Commons Attribution Share Alike 4.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
PLOD-AbbreviationDetection
-
Clustering to find abbreviations
Finally, the main problem with unsupervised learning is that you won't be able to reliably measure system performance or improvement. In my view, any time you can spend annotating and collecting data for a (semi-)supervised solution will be well-spent. Existing datasets can also get you started with model development, such as https://github.com/surrey-nlp/PLOD-AbbreviationDetection. Once you have a good model on a conventional dataset, you should be able to start generalizing it to your specific task/dataset.
adaptnlp
-
Tools to use for Semantic-searching Question Answering System
Check out adaptnlp
-
Case Sensitivity using HuggingFace & Google's T5 model (base)
Yes, there are capitals in the tokenizer vocabulary of t5-base and t5-small, so both support capitalization. A few days ago I was using t5-small through adaptnlp for extractive summarization and capitalization was working fine (https://github.com/Novetta/adaptnlp). AdaptNLP is basically just a transformers wrapper, so if you can't figure out a solution, you could just dissect their source code.
What are some alternatives?
converse - Conversational text Analysis using various NLP techniques
Basic-UI-for-GPT-J-6B-with-low-vram - A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.
hate-speech-and-offensive-language - Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017
keytotext - Keywords to Sentences
ThoughtSource - A central, open resource for data and tools related to chain-of-thought reasoning in large language models. Developed @ Samwald research group: https://samwald.info/
fastai - The fastai deep learning library
nlp - Repository for all things Natural Language Processing
gector - Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)
transformers-interpret - Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
browser-ml-inference - Edge Inference in Browser with Transformer NLP model
Transformers-Tutorials - This repository contains demos I made with the Transformers library by HuggingFace.
ML-Workspace - 🛠 All-in-one web-based IDE specialized for machine learning and data science.