dsir
FARM
dsir | FARM | |
---|---|---|
1 | 3 | |
199 | 1,730 | |
7.0% | 0.4% | |
7.7 | 0.0 | |
2 months ago | 6 months ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dsir
-
🧵 Researchers at Stanford Propose A Cheap And Scalable Data Selection Framework Based on Important Resampling For Improving The Downstream Performance of Language Models
Quick Read: https://www.marktechpost.com/2023/02/16/researchers-at-stanford-propose-a-cheap-and-scalable-data-selection-framework-based-on-importance-resampling-for-improving-the-downstream-performance-of-language-models/ Paper: https://arxiv.org/pdf/2302.03169.pdf Github: https://github.com/p-lambda/dsir
FARM
-
Can someone please explain to me the differences between train, dev and test datasets?
I'm also trying to solve this task in a python notebook (.ipynb) using the FARM framework https://farm.deepset.ai/ and BERT model of huggingface https://huggingface.co/bert-base-uncased
-
Fine-Tuning Transformers for NLP
For anyone looking to fine-train transformers with less work, there is the FARM project (https://github.com/deepset-ai/FARM) which has some more or less ready-to-go configurations (classification, question answering, NER, and a couple of others). It's really almost "plug in a csv and run".
By the way, a pet peeve is sentiment detection. It's a useful method, but please be aware that it does not measure "sentiment" in a way that one would normally think, and that what it measure varies strongly across methods (https://www.tandfonline.com/doi/abs/10.1080/19312458.2020.18...).
-
Has anyone deployed a BERT like model across multiple tasks (Multi-class, NER, outlier detection)? Seeking advice.
You can use https://github.com/deepset-ai/FARM or https://github.com/nyu-mll/jiant for multitask learning. The second is more general.
What are some alternatives?
Giveme5W1H - Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
bertviz - BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
Questgen.ai - Question generation using state-of-the-art Natural Language Processing algorithms
haystack - :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
happy-transformer - Happy Transformer makes it easy to fine-tune and perform inference with NLP Transformer models.
BERT-NER - Pytorch-Named-Entity-Recognition-with-BERT
tldr-transformers - The "tl;dr" on a few notable transformer papers (pre-2022).
BERTweet - BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
lora - Using Low-rank adaptation to quickly fine-tune diffusion models.
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Chinese-CLIP - Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
TEAM - Our EMNLP 2022 paper on MCQA