transformers
gpt-3-experiments
Our great sponsors
transformers | gpt-3-experiments | |
---|---|---|
174 | 6 | |
124,557 | 709 | |
2.7% | - | |
10.0 | 0.0 | |
5 days ago | almost 4 years ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
transformers
-
Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding
The HuggingFace transformers library already has support for a similar method called prompt lookup decoding that uses the existing context to generate an ngram model: https://github.com/huggingface/transformers/issues/27722
I don't think it would be that hard to switch it out for a pretrained ngram model.
-
AI enthusiasm #6 - Finetune any LLM you want💡
Most of this tutorial is based on Hugging Face course about Transformers and on Niels Rogge's Transformers tutorials: make sure to check their work and give them a star on GitHub, if you please ❤️
-
Schedule-Free Learning – A New Way to Train
* Superconvergence + LR range finder + Fast AI's Ranger21 optimizer was the goto optimizer for CNNs, and worked fabulously well, but on transformers, the learning rate range finder sadi 1e-3 was the best, whilst 1e-5 was better. However, the 1 cycle learning rate stuck. https://github.com/huggingface/transformers/issues/16013
-
Gemma doesn't suck anymore – 8 bug fixes
Thanks! :) I'm pushing them into transformers, pytorch-gemma and collabing with the Gemma team to resolve all the issues :)
The RoPE fix should already be in transformers 4.38.2: https://github.com/huggingface/transformers/pull/29285
My main PR for transformers which fixes most of the issues (some still left): https://github.com/huggingface/transformers/pull/29402
- HuggingFace Transformers: Qwen2
- HuggingFace Transformers Release v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2
- HuggingFace: Support for the Mixtral Moe
-
Paris-Based Startup and OpenAI Competitor Mistral AI Valued at $2B
If you want to tinker with the architecture Hugging Face has a FOSS implementation in transformers: https://github.com/huggingface/transformers/blob/main/src/tr...
If you want to reproduce the training pipeline, you couldn't do that even if you wanted to because you don't have access to thousands of A100s.
-
Fail to reproduce the same evaluation metrics score during inference.
I am aware that using mixed precision reduces the stability of weight and there will be little consistency but don't expect it to be this much. I have attached the graph of evaluation metrics. If someone can give me some insight into this issue, that would be great.
-
[D] What is a good way to maintain code readability and code quality while scaling up complexity in libraries like Hugging Face?
In transformers, they tried really hard to have a single function or method to deal with both self and cross attention mechanisms, masking, positional and relative encodings, interpolation etc. While it allows a user to use the same function/method for any model, it has led to severe parameter bloat. Just compare the original implementation of llama by FAIR with the implementation by HF to get an idea.
gpt-3-experiments
-
AI chatbots are not a replacement for search engines
The problem with ChatGPT as a replacement for Google is that it was not designed to produce accurate facts, and it shows. This model cut its teeth writing articles about the discovery of unicorns in the Andes[0] for goodness sake! It's a language model, and a very impressive one at that, but language is used to express falsehoods and fiction just as regularly as it is used to express truth.
This doesn't mean that it can't produce accurate facts, most of the time it does! But when it does produce nonsense, it does it in exactly the same tone of authority, so if you don't already know the answer you may well walk away believing an AI hallucination.
And the trouble is it doesn't really matter if everyone here thinks "well, I would follow up each request with research to verify the answer", because most people won't! This is like the Google answer extracts, which fairly frequently mislead by extracting out-of-context quotes, except that there's no way to get the original context and there may in fact be no original context! This makes follow-up research much more complicated than with Google and therefore unlikely to happen. If ChatGPT replaces Google, the amount of nonsense on the internet will get even worse, which is something that until 2022 I never thought was possible.
[0] https://github.com/minimaxir/gpt-3-experiments/blob/master/e...
- Artificial Intelligence writes
-
The Computers Are Getting Better at Writing
See also my experiments with GPT-3 on sane prompts, which have wildly varying quality even after generating them in bulk: https://github.com/minimaxir/gpt-3-experiments
Creative writing hasn't been one of the super-hyped use cases by OpenAI for the OpenAI API outside of AI Dungeon, surprisingly. For just random generation, the necessary curation can detract from the time-savings advantages. (as an aside, the API is also extremely expensive for long-form content to the point I'm not sure how the economics work for these startups even with charging monthly fees).
I'm more bullish on small bespoke models for a given use case, which is what I spend my time researching.
-
Does GPT-2 Know Your Phone Number?
Thanks, didn't twig onto the fact that you linked a subtree of the whole repo. Weird that even with the nonzero temp the AskReddit prompt went a bit loopy.
> https://github.com/minimaxir/gpt-3-experiments/blob/master/e...
Oh my goodness that is absurd in the most delightful way. Thanks for sharing that.
What are some alternatives?
fairseq - Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
languagetool - Style and Grammar Checker for 25+ Languages
sentence-transformers - Multilingual Sentence & Image Embeddings with BERT
chatgpt-google-extension - A browser extension that enhance search engines with ChatGPT
llama - Inference code for Llama models
vim-LanguageTool - A vim plugin for the LanguageTool grammar checker
transformer-pytorch - Transformer: PyTorch Implementation of "Attention Is All You Need"
Gleemin - A Magic: the Gathering™ expert system
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
chatgpt-raycast - ChatGPT raycast extension
huggingface_hub - The official Python client for the Huggingface Hub.
THELEMA - My MSc thesis: a grammar induction system