[D] Would a Tesla M40 provide cheap inference acceleration for self-hosted LLMs?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

llama.cpp

769 55,846 10.0 C++

LLM inference in C/C++

Not what you asked, but there are projects out there to run llama and its descendants on CPU only using system RAM. See https://github.com/ggerganov/llama.cpp/discussions/643 for a discussion about running Vicuna-13b in particular. The 4 bit quantized model should run comfortably within a 16GB system.

nvidia-docker

53 16,998 0.0 Makefile

Discontinued Build and run Docker containers leveraging NVIDIA GPUs
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
nvidia-patch

309 2,959 8.5 Python

This patch removes restriction on maximum number of simultaneous NVENC video encoding sessions imposed by Nvidia to consumer-grade GPUs.
vGPU_LicenseBypass

3 212 0.0 PowerShell

A simple script that works around Nvidia vGPU licensing with a scheduled task.
text-generation-webui

876 36,293 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

I’ve been using oobabooga text webui (the one-click installer) and it works mostly fine. Just need to make sure you have a model that works with it, like one of the 4bit 128 group quantized models, and set the right arguments in start-webui.bat for the model

turbopilot

15 3,839 10.0 C++

Discontinued Turbopilot is an open source large-language-model based code completion engine that runs locally on CPU

I don't know if this applies to your use case but this would probably work if you are looking for an llm to help with programming. Haven't really played around with it but this may work for general llm tasks, it doesn't have a web UI though.

simpleAI

11 319 7.3 Python

An easy way to host your own AI API and expose alternative models, while being compatible with "open" AI clients.

I don't know if this applies to your use case but this would probably work if you are looking for an llm to help with programming. Haven't really played around with it but this may work for general llm tasks, it doesn't have a web UI though.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project