promptsource
datasets
promptsource | datasets | |
---|---|---|
11 | 15 | |
2,505 | 18,443 | |
2.2% | 1.0% | |
4.6 | 9.5 | |
6 months ago | about 23 hours ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
promptsource
- How to Prompt Design? Share resources
-
Any tips for hiring prompt engineers?
Bigscience Promptsource
- PromptSource: Toolkit for creating, sharing and using natural language prompts
-
Hugging Face Introduces “T0”, An Encoder-Decoder Model That Consumes Textual Inputs And Produces Target Responses
Quick 5 Min Read | Paper|Github
- 16x smaller than GPT3 but better [video]
-
[R] BigScience's first paper, T0: Multitask Prompted Training Enables Zero-Shot Task Generalization
Code for https://arxiv.org/abs/2110.08207 found: https://github.com/bigscience-workshop/promptsource/
- "P3: Public Pool of Prompts" (BigScience's collaborative collection of >2k prompts for >170 datasets)
- BigScience's guide to using templating languages to develop prompts
-
word2vec chatbot
I'd use a prompted dataset then, as well as explore the TO model framework.
-
First model released by BigScience outperforms GPT-3 while being 16x smaller
We fine-tuned the model on a dozens of different NLP datasets and tasks in a prompted style. You can read all the prompts in the appendix or get them all here: https://github.com/bigscience-workshop/promptsource . Most NLP tasks are not particularly freeform, or they are naturally length limited like summary (XSum is very short). As a consequence, the model mostly defaults to short responses. Your "trick" is not that unreasonable though! Many of the training prompts that want long responses, ask for them explicitly.
datasets
- 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑💻 🥇
- Mastering ROUGE Matrix: Your Guide to Large Language Model Evaluation for Summarization with Examples
-
How to Train Large Models on Many GPUs?
https://github.com/huggingface/datasets
https://github.com/huggingface/transformers
-
[D] Can we use Ray for distributed training on vertex ai ? Can someone provide me examples for the same ? Also which dataframe libraries you guys used for training machine learning models on huge datasets (100 gb+) (because pandas can't handle huge data).
https://huggingface.co/docs/datasets backed with an Arrow file or buffer
- Need help with a data science project
-
Is there a text evaluation metric that does not need reference text?
I'm looking for an automatic evaluation metric that can score the first text higher (since it's more grammatically correct/better for other reasons). All the metrics for NLG I found require some reference text to match the generated text with, which I don't have.
-
FauxPilot – an open-source GitHub Copilot server
And then pass that my_code.json as the dataset name.
[1] https://github.com/huggingface/datasets
-
Hugging Face Introduces ‘Datasets’: A Lightweight Community Library For Natural Language Processing (NLP)
Code for https://arxiv.org/abs/2109.02846 found: https://github.com/huggingface/datasets
Quick Read | Paper | Github
- Datasets: A Community Library for Natural Language Processing
What are some alternatives?
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
sentence-transformers - Multilingual Sentence & Image Embeddings with BERT
eai-prompt-gallery - Library of interesting prompt generations
datumaro - Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.
natural-instructions - Expanding natural instructions
cypress-realworld-app - A payment application to demonstrate real-world usage of Cypress testing methods, patterns, and workflows.
spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python
edex-ui - A cross-platform, customizable science fiction terminal emulator with advanced monitoring & touchscreen support.
rasa - 💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
first-contributions - 🚀✨ Help beginners to contribute to open source projects
frankmocap - A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator
evaluate - 🤗 Evaluate: A library for easily evaluating machine learning models and datasets.