Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
nvidia-patch
This patch removes restriction on maximum number of simultaneous NVENC video encoding sessions imposed by Nvidia to consumer-grade GPUs.
-
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
-
turbopilot
Discontinued Turbopilot is an open source large-language-model based code completion engine that runs locally on CPU
-
simpleAI
An easy way to host your own AI API and expose alternative models, while being compatible with "open" AI clients.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Not what you asked, but there are projects out there to run llama and its descendants on CPU only using system RAM. See https://github.com/ggerganov/llama.cpp/discussions/643 for a discussion about running Vicuna-13b in particular. The 4 bit quantized model should run comfortably within a 16GB system.
I’ve been using oobabooga text webui (the one-click installer) and it works mostly fine. Just need to make sure you have a model that works with it, like one of the 4bit 128 group quantized models, and set the right arguments in start-webui.bat for the model
I don't know if this applies to your use case but this would probably work if you are looking for an llm to help with programming. Haven't really played around with it but this may work for general llm tasks, it doesn't have a web UI though.
I don't know if this applies to your use case but this would probably work if you are looking for an llm to help with programming. Haven't really played around with it but this may work for general llm tasks, it doesn't have a web UI though.
Related posts
- Plex setup through Docker + Nvidia card, but hardware acceleration stops working after some time
- Seeking Guidance on Leveraging Local Models and Optimizing GPU Utilization in containerized packages
- Which GPU for HW transcoding in PMS: Intel Arc or Nvidia?
- Help! Accelerated-GPU with Cuda and CuPy
- Plex Transcode (VC1 (HW) 1080p H264 (HW) 1080p) on Pixel 7 Pro