Is it possible to train a Lora on a 6GB vram GPU?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • doki-rnn

    A DDLC mod using a neural net that was trained to code Ren'Py script.

  • I want to fine-tune OpenLlaMA 3B and make something similar to this project but on top of Llama model (https://github.com/stephwag/doki-rnn). But I don't have a very powerful GPU. It is GTX 1660 with 6GB vram. I can easily run 13B models in GGML formats but can't make a Lora for 3B model. For the first test I tried to create a small lora trained on 10 letters in Oobabooga WebUI. I tried to load the model in GPTQ and GGML formats, but got only a few errors. When I try with GGML format I get the error "LlamaCppModel' object has no attribute 'decode'". When I try with GPTQ-for-Llama format using monkey_patch I get the error "NotImplementedError". When I try with AutoGPTQ format using monkey_patch I get the error "Target module QuantLinear() is not supported". As I understand it, to create a lora in Oobabooga you need to load the model in Transformers format, but I can't to load the model in Transformers format because of Out Of Memory error. If I load it in 4-bit or 8-bit I get error "size mismatch for base_model"

  • qlora

    QLoRA: Efficient Finetuning of Quantized LLMs

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts