Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
alpaca-lora applied this successfully to fine-tuning LLaMa, and then exported / combined with the original model, later quantizing back to 4-bit so that it could be loaded by alpaca.cpp.
Hugging Face has support for training models in 8-bit through LLM.int8 + their "PEFT" library, which helps reduce the size some, as just training an adapter or prefix, not the full model. This will be more than the 4-bit models, though.
Haven't tried it yet, https://github.com/johnsmith0031/alpaca_lora_4bit, but reports it's working. I guess I should have tried the 7b first, but I like to do things the hard way.