Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I want to fine-tune OpenLlaMA 3B and make something similar to this project but on top of Llama model (https://github.com/stephwag/doki-rnn). But I don't have a very powerful GPU. It is GTX 1660 with 6GB vram. I can easily run 13B models in GGML formats but can't make a Lora for 3B model. For the first test I tried to create a small lora trained on 10 letters in Oobabooga WebUI. I tried to load the model in GPTQ and GGML formats, but got only a few errors. When I try with GGML format I get the error "LlamaCppModel' object has no attribute 'decode'". When I try with GPTQ-for-Llama format using monkey_patch I get the error "NotImplementedError". When I try with AutoGPTQ format using monkey_patch I get the error "Target module QuantLinear() is not supported". As I understand it, to create a lora in Oobabooga you need to load the model in Transformers format, but I can't to load the model in Transformers format because of Out Of Memory error. If I load it in 4-bit or 8-bit I get error "size mismatch for base_model"