[D] Would a Tesla M40 provide cheap inference acceleration for self-hosted LLMs?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • llama.cpp

    LLM inference in C/C++

  • Not what you asked, but there are projects out there to run llama and its descendants on CPU only using system RAM. See https://github.com/ggerganov/llama.cpp/discussions/643 for a discussion about running Vicuna-13b in particular. The 4 bit quantized model should run comfortably within a 16GB system.

  • nvidia-docker

    Discontinued Build and run Docker containers leveraging NVIDIA GPUs

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • nvidia-patch

    This patch removes restriction on maximum number of simultaneous NVENC video encoding sessions imposed by Nvidia to consumer-grade GPUs.

  • vGPU_LicenseBypass

    A simple script that works around Nvidia vGPU licensing with a scheduled task.

  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • I’ve been using oobabooga text webui (the one-click installer) and it works mostly fine. Just need to make sure you have a model that works with it, like one of the 4bit 128 group quantized models, and set the right arguments in start-webui.bat for the model

  • turbopilot

    Discontinued Turbopilot is an open source large-language-model based code completion engine that runs locally on CPU

  • I don't know if this applies to your use case but this would probably work if you are looking for an llm to help with programming. Haven't really played around with it but this may work for general llm tasks, it doesn't have a web UI though.

  • simpleAI

    An easy way to host your own AI API and expose alternative models, while being compatible with "open" AI clients.

  • I don't know if this applies to your use case but this would probably work if you are looking for an llm to help with programming. Haven't really played around with it but this may work for general llm tasks, it doesn't have a web UI though.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts