Why run LLMs locally?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • koboldcpp

    A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

  • KoboldCpp will let you run GGML models in CPU-only mode, but will use GPUs if it finds them (with no configuration really needed, it just does it). I have a crappy Nvidia GTX1660Ti, which may be the worst video card ever made for AI, and I get about 3 tokens per second on 7b models, just under 2 tokens per second on 13b models, and about 1.5 seconds per token on 30b models.

  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • This is the best guide I'm aware of. Though obviously heavily focused on training using oobabooga as a GUI. The general concepts it describes are fairly universal though.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • AlpacaDataCleaned

    Alpaca dataset from Stanford, cleaned and curated

  • This cleaned alpaca dataset gives a good idea of how data is formatted for the standard alpaca json format. Personally, I'd handle making your own datasets by using gpt4 to format the data into a dataset. You can do it by hand or use a llama model, but I've personally just found using chatgpt to be the most efficient way to get the highest possible output. I'm trying to go for quality over quantity.

  • LLaMA-LoRA-Tuner

    UI tool for fine-tuning and testing your own LoRA models base on LLaMA, GPT-J and more. One-click run on Google Colab. + A Gradio ChatGPT-like Chat UI to demonstrate your language models.

  • The bad news is that, as far as I know, it does require a GPU. The good news is that I've gotten training done with a 7b model on both google colab and kaggle with free accounts. Both have 'just' enough vram to make it work as long as you use load the model in 8bit. Like --load-in-8bit on the command line with oobabooga. The Lora Tuner frontend even has a colab notebook set up to simplify things even more. Though the frontend keeps the LoRA Rank and LoRA Alpha values capped pretty low. Thankfully that's just set in the GUI though. I think it was one of the files in its UI directory. Pretty easy to just hand edit it to allow for higher values if desired.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • [P] Uptraining a pretrained model using company data?

    4 projects | /r/MachineLearning | 25 May 2023
  • (HELP) Token Issue on Generation

    1 project | /r/LocalLLaMA | 19 May 2023
  • Help with Random Characters and Words on Output

    1 project | /r/LocalLLaMA | 18 May 2023
  • Fine-tuning LLaMA for research without Meta license

    1 project | /r/LocalLLaMA | 15 May 2023
  • How can I train my custom dataset on top of Vicuna?

    6 projects | /r/LocalLLaMA | 19 Apr 2023