-
Not what you asked, but there are projects out there to run llama and its descendants on CPU only using system RAM. See https://github.com/ggerganov/llama.cpp/discussions/643 for a discussion about running Vicuna-13b in particular. The 4 bit quantized model should run comfortably within a 16GB system.
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
-
nvidia-patch
This patch removes restriction on maximum number of simultaneous NVENC video encoding sessions imposed by Nvidia to consumer-grade GPUs.
-
-
text-generation-webui
A Gradio web UI for Large Language Models with support for multiple inference backends.
I’ve been using oobabooga text webui (the one-click installer) and it works mostly fine. Just need to make sure you have a model that works with it, like one of the 4bit 128 group quantized models, and set the right arguments in start-webui.bat for the model
-
turbopilot
Discontinued Turbopilot is an open source large-language-model based code completion engine that runs locally on CPU
I don't know if this applies to your use case but this would probably work if you are looking for an llm to help with programming. Haven't really played around with it but this may work for general llm tasks, it doesn't have a web UI though.
-
simpleAI
An easy way to host your own AI API and expose alternative models, while being compatible with "open" AI clients.
I don't know if this applies to your use case but this would probably work if you are looking for an llm to help with programming. Haven't really played around with it but this may work for general llm tasks, it doesn't have a web UI though.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Related posts
-
Plex setup through Docker + Nvidia card, but hardware acceleration stops working after some time
-
Seeking Guidance on Leveraging Local Models and Optimizing GPU Utilization in containerized packages
-
Which GPU for HW transcoding in PMS: Intel Arc or Nvidia?
-
Help! Accelerated-GPU with Cuda and CuPy
-
Plex Transcode (VC1 (HW) 1080p H264 (HW) 1080p) on Pixel 7 Pro