-
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
-
llama-dl
Discontinued High-speed download of LLaMA, Facebook's 65B parameter GPT model [UnavailableForLegalReasons - Repository access blocked]
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
My Ubuntu desktop has 64 gigs RAM, with a 12G RTX 3060 card. I have 4 bit 13B parameter LLaMA running on it currently, following these instructions - https://github.com/oobabooga/text-generation-webui/wiki/LLaM... . They don't have 30B or 65B ready yet.
Might try other methods to do 30B, or switch to my M1 Macbook if that's useful (as it said here). Don't have an immediate need for it, just futzing with it currently.
Sure. You can get models with magnet link from here https://github.com/shawwn/llama-dl/
To get running, just follow these steps https://github.com/ggerganov/llama.cpp/#usage
I'm pretty sure there's a mistake here: https://github.com/cocktailpeanut/dalai/blob/main/index.js#L... , there's a ${suffix} missing
It causes the quantization to process to always use the first part of the model if using a larger size than 7B. I don't even know what this stuff does, but I see the ggml-model-f16.bin files have ggml-model-f16.bin.X as well in the folder, so I'm pretty sure this is a mistake
See https://github.com/ggerganov/llama.cpp/issues/62 (the related repo was originally posted on 4chan, is all, but the code is on GitHub)