How to do Llama 30B 4bit finetuning?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • alpaca-lora

    Instruct-tune LLaMA on consumer hardware

  • alpaca-lora applied this successfully to fine-tuning LLaMa, and then exported / combined with the original model, later quantizing back to 4-bit so that it could be loaded by alpaca.cpp.

  • peft

    🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

  • Hugging Face has support for training models in 8-bit through LLM.int8 + their "PEFT" library, which helps reduce the size some, as just training an adapter or prefix, not the full model. This will be more than the 4-bit models, though.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • alpaca_lora_4bit

  • Haven't tried it yet, https://github.com/johnsmith0031/alpaca_lora_4bit, but reports it's working. I guess I should have tried the 7b first, but I like to do things the hard way.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts