gptqlora
datablations
Our great sponsors
gptqlora | datablations | |
---|---|---|
2 | 6 | |
94 | 289 | |
- | 8.7% | |
7.6 | 6.9 | |
11 months ago | about 1 month ago | |
Python | Jupyter Notebook | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
gptqlora
-
(2/2) May 2023
GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ (https://github.com/qwopqwop200/gptqlora/tree/main)
-
GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ
The difference from QLoRA is that GPTQ is used instead of NF4 (Normal Float4) + DQ (Double Quantization) for model quantization. The advantage is that you can expect better performance because it provides better quantization than conventional bitsandbytes. The downside is that it is a one-shot quantization methodology, so it is more inconvenient than bitsandbytes, and unlike bitsandbytes, it is not universal. I'm still experimenting, but it seems to work. At least, I hope it can be more options for people using LoRA. https://github.com/qwopqwop200/gptqlora/tree/main
datablations
-
Gemini is only 1x Chinchilla, so it undertrained for production
1x chinchilla means it's not really undertrained but that more could be squeezed without excessive difficulty https://arxiv.org/abs/2305.16264
- Can LLMs learn from a single example?
-
Chinchilla’s Death
You might want to give a read to "Scaling Data-Constrained Language Models" [1]. They basically generalized the Chinchilla scaling law by investigating behavior on multi-epoch runs.
[1] https://arxiv.org/abs/2305.16264
-
RWKV Pile+ seems to be training on far more tokens than any LLM ever has
I would imagine that there is a lot of overlap, yeah. That said, training on repeated data does seem to be effective at this level.
-
(2/2) May 2023
Scaling Data-Constrained Language Models (https://arxiv.org/abs/2305.16264)
- How to Keep Scaling Large Language Models when Data Runs Out? A New AI Research Trains 400 Models with up to 9B Parameters and 900B Tokens to Create an Extension of Chinchilla Scaling Laws for Repeated Data
What are some alternatives?
tree-of-thoughts - Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
TinyLlama - The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
GirlfriendGPT - Girlfriend GPT is a Python project to build your own AI girlfriend using ChatGPT4.0
airoboros - Customizable implementation of the self-instruct paper.
chathub - All-in-one chatbot client
chain-of-thought-hub - Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
prompt-engineering - Tips and tricks for working with Large Language Models like OpenAI's GPT-4.
guidance - A guidance language for controlling large language models. [Moved to: https://github.com/guidance-ai/guidance]
SuperAGI - <⚡️> SuperAGI - A dev-first open source autonomous AI agent framework. Enabling developers to build, manage & run useful autonomous agents quickly and reliably.
gorilla - Gorilla: An API store for LLMs