-
serge
A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Serge made it really easy for me to get started, but it all CPU-based.
MLC LLM looks like an easy option to use my AMD GPU.
Llama.cpp seems like it can use both CPU and GPU, but I haven't quite figured that out yet.
You can try Koboldcpp with CLblast from this repo: https://github.com/LostRuins/koboldcpp/releases It allows to offload several layers to GPU with significant boost of prompt processing speed and inference speed.
As mentioned, exllama is the way to go. Once you install ROCm and the official ROCm PyTorch you are ready to go. A 16GB 6800XT will support running 13B 4-bit GPTQs with full context. Spare_Side just posted a report w/ the same GPU.
Related posts
-
Meet Atom the GPT Assistant, an AI-powered Smart Home Assistant. It's like Google Assistant but with endless possibility of ChatGPT, it's like Siri but with extensibility of Open Source power.
-
LocalAI: OpenAI compatible API to run LLM models locally on consumer grade hardware!
-
chatgpt alternative
-
Show HN: LlamaGPT – Self-hosted, offline, private AI chatbot, powered by Llama 2
-
LeCun: Qualcomm working with Meta to run Llama-2 on mobile devices