Why run LLMs locally?

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

koboldcpp

180 3,951 10.0 C++

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

KoboldCpp will let you run GGML models in CPU-only mode, but will use GPUs if it finds them (with no configuration really needed, it just does it). I have a crappy Nvidia GTX1660Ti, which may be the worst video card ever made for AI, and I get about 3 tokens per second on 7b models, just under 2 tokens per second on 13b models, and about 1.5 seconds per token on 30b models.

text-generation-webui

876 36,827 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

This is the best guide I'm aware of. Though obviously heavily focused on training using oobabooga as a GUI. The general concepts it describes are fairly universal though.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
AlpacaDataCleaned

14 1,394 7.6 Python

Alpaca dataset from Stanford, cleaned and curated

This cleaned alpaca dataset gives a good idea of how data is formatted for the standard alpaca json format. Personally, I'd handle making your own datasets by using gpt4 to format the data into a dataset. You can do it by hand or use a llama model, but I've personally just found using chatgpt to be the most efficient way to get the highest possible output. I'm trying to go for quality over quantity.

LLaMA-LoRA-Tuner

6 425 7.9 Python

UI tool for fine-tuning and testing your own LoRA models base on LLaMA, GPT-J and more. One-click run on Google Colab. + A Gradio ChatGPT-like Chat UI to demonstrate your language models.

The bad news is that, as far as I know, it does require a GPU. The good news is that I've gotten training done with a 7b model on both google colab and kaggle with free accounts. Both have 'just' enough vram to make it work as long as you use load the model in 8bit. Like --load-in-8bit on the command line with oobabooga. The Lora Tuner frontend even has a colab notebook set up to simplify things even more. Though the frontend keeps the LoRA Rank and LoRA Alpha values capped pretty low. Thankfully that's just set in the GUI though. I think it was one of the files in its UI directory. Pretty easy to just hand edit it to allow for higher values if desired.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

[P] Uptraining a pretrained model using company data?

4 projects | /r/MachineLearning | 25 May 2023
(HELP) Token Issue on Generation

1 project | /r/LocalLLaMA | 19 May 2023
Help with Random Characters and Words on Output

1 project | /r/LocalLLaMA | 18 May 2023
Fine-tuning LLaMA for research without Meta license

1 project | /r/LocalLLaMA | 15 May 2023
How can I train my custom dataset on top of Vicuna?

6 projects | /r/LocalLLaMA | 19 Apr 2023

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
alpaca alpaca-lora llama Lora Machine Learning
Post date: 8 May 2023

koboldcpp

text-generation-webui

InfluxDB

AlpacaDataCleaned

LLaMA-LoRA-Tuner

Related posts

[P] Uptraining a pretrained model using company data?

(HELP) Token Issue on Generation

Help with Random Characters and Words on Output

Fine-tuning LLaMA for research without Meta license

How can I train my custom dataset on top of Vicuna?