How to do Llama 30B 4bit finetuning?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

alpaca-lora

107 18,167 3.6 Jupyter Notebook

Instruct-tune LLaMA on consumer hardware

alpaca-lora applied this successfully to fine-tuning LLaMa, and then exported / combined with the original model, later quantizing back to 4-bit so that it could be loaded by alpaca.cpp.

peft

26 13,670 9.7 Python

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Hugging Face has support for training models in 8-bit through LLM.int8 + their "PEFT" library, which helps reduce the size some, as just training an adapter or prefix, not the full model. This will be more than the 4-bit models, though.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
alpaca_lora_4bit

41 526 8.6 Python

Haven't tried it yet, https://github.com/johnsmith0031/alpaca_lora_4bit, but reports it's working. I guess I should have tried the 7b first, but I like to do things the hard way.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project