LLMs on your local Computer (Part 1)

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

ggml

69 9,642 9.8 C

Tensor library for machine learning

git clone https://github.com/ggerganov/ggml cd ggml mkdir build cd build cmake .. make -j4 gpt-j ../examples/gpt-j/download-ggml-model.sh 6B

llama.cpp

769 55,846 10.0 C++

LLM inference in C/C++

git clone --depth=1 https://github.com/ggerganov/llama.cpp cd llama.cpp mkdir build cd build cmake .. cmake --build . --config Release wget -c --show-progress -o models/llama-2-13b.Q4_0.gguf https://huggingface.co/TheBloke/Llama-2-13B-GGUF/resolve/main/llama-2-13b.Q4_0.gguf?download=true

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
ollama

1 46,373 - Go

Discontinued Get up and running with Llama 2, Mistral, Gemma, and other large language models. [Moved to: https://github.com/ollama/ollama] (by jmorganca)

ollama.ai

lit-gpt

4 5,243 9.6 Python

Discontinued Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed. [Moved to: https://github.com/Lightning-AI/litgpt]

git clone --depth=1 https://github.com/Lightning-AI/lit-gpt cd lit-gpt pip install -r requirements.txt pip install bitsandbytes==0.41.0 huggingface_hub python scripts/download.py --repo_id stabilityai/stablelm-zephyr-3b --from_safetensors=True python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/stabilityai/stablelm-zephyr-3b --dtype float32

FastChat

82 33,877 9.6 Python

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

FastChat

whisper.cpp

187 30,942 9.8 C

Port of OpenAI's Whisper model in C/C++

The ggml library is one of the first library for local LLM interference. It’s a pure C library that converts models to run on several devices, including desktops, laptops, and even mobile device - and therefore, it can also be considered as a tinkering tool, trying new optimizations, that will then be incorporated into other downstream projects. This tool is at the heart of several other projects, powering LLM interference on desktop or even mobile phones. Subprojects for running specific LLMs or LLM families exists, such as whisper.cpp.

lit-llama

23 5,789 8.4 Python

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

A pure Python library for running several open source LLMs locally, such as Mistral or StableLM. It also branched into subprojects for supporting specific LLM model-families, such as lit-llama. The projects goal are simplicity and optimization to run on several hardware. When used with 4bit quantization, it needs below 1GB to run a 3B model, and also CPU usage remains low.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project