Ask HN: What's the best self hosted/local alternative to GPT-4?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

LocalAI

82 19,033 9.9 C++

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.

In our experimentation, we've found that it really depends what you're looking for. That is you really need to break down down evaluation by task. Local models don't have the power yet to just "do it all well" like GPT4.
There are open source models that are fine tuned for different tasks, and if you're able to pick a specific model for a specific use case you'll get better results.
---
For example, for chat there are models like `mpt-7b-chat` or `GPT4All-13B-snoozy` or `vicuna` that do okay for chat, but are not great at reasoning or code.
Other models are designed for just direct instruction following, but are worse at chat `mpt-7b-instruct`
Meanwhile, there are models designed for code completion like from replit and HuggingFace (`starcoder`) that do decently for programming but not other tasks.
---
For UI the easiest way to get a feel for quality of each of the models (or, chat models at least) is probably https://gpt4all.io/.
And as others have mentioned, for providing an API that's compatible with OpenAI, https://github.com/go-skynet/LocalAI seems to be the frontrunner at the moment.
---
For the project I'm working on (in bio) we're currently struggling with this problem too since we want a nice UI, good performance, and the ability for people to keep their data local.
So at least for the moment, there's no single drop-in replacement for all tasks. But things are changing every week and every day, and I believe that open-source and local can be competitive in the end.

qlora

80 9,344 7.4 Jupyter Notebook

QLoRA: Efficient Finetuning of Quantized LLMs

> It's certainly not on a level with ChatGPT, but what is?
Guanaco-65B is per https://arxiv.org/abs/2305.14314
CPU Version: https://huggingface.co/TheBloke/guanaco-65B-GGML
GPU Version: https://huggingface.co/TheBloke/guanaco-65B-HF
4bit GPU Version: https://huggingface.co/TheBloke/guanaco-65B-GPTQ

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
text-generation-webui

876 35,862 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

Guanaco is indeed very capable and based on my tests can replace GPT 3.5 in almost all scenarios.
Easy way to self-host it is to use text-generation-webui[1] and 33B 4-bit quantized GGML model from TheBloke[2].
[1] https://github.com/oobabooga/text-generation-webui
[2] https://huggingface.co/TheBloke/guanaco-33B-GGML

llama.cpp

769 55,846 10.0 C++

LLM inference in C/C++

If you want to go cpu only then llama.cpp is looking like a good project: https://github.com/ggerganov/llama.cpp

private-gpt

131 51,534 9.2 Python

Interact with your documents using the power of GPT, 100% privately, no data leaks
chat-ui

40 5,842 9.7 TypeScript

Open source codebase powering the HuggingChat app

I lay out the whole LLM landscape in this article: https://medium.com/@damngoodtech/vital-gpt-and-llm-understan.... Even if you aren't a business it might help.
And this spreadsheet shows a pretty comprehensive list of LLM models: https://anania.ai/chatgpt-alternatives/
Currently the "best" ones seem to be Llama and Dolly. Dolly can be used commercially, and Llama cannot, so it's best for personal use.
I myself have been trying to get [the huggingface chat ui](https://github.com/huggingface/chat-ui) running on my own system, but it's finicky. Right now I'm focused on getting immediate income so I can't spend too much effort on it.
Overall, no model gets close to the accuracy of GPT-3 or 4 (though Llama does decently), though I can definitely imagine in 3 years or so open source can match or even exceed the capabilities of OpenAI's model.

chatbot-ui

63 26,051 9.4 TypeScript

AI chat for every model.

It depends what you mean by "viable alternatives" and how much money you are prepared to spend on hardware to self-host. As others have mentioned, you can try llama.cpp and LocalAI, but for most ChatGPT-like applications, you won't get anything like as good results. I've found that using GPT-4 OpenAI API is somewhat more reliable than ChatGPT, either via the Playground or via a self-hosted chat interface like https://github.com/mckaywrigley/chatbot-ui

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
basaran

22 1,281 10.0 Python

Discontinued Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.

Guanaco-65B[0] using Basaran[1] for your OpenAI compatible API. You can use any ChatGPT front-end which lets you change the OpenAI endpoint URL.
[0] An fp4 finetune of LLaMA-30B by Tim Dettmers
[1] https://github.com/hyperonym/basaran

llm-foundry

37 3,693 9.7 Python

LLM training code for Databricks foundation models

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project