Most efficient way to set up API serving of custom LLMs?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

llama-cpp-python

54 6,280 9.9 Python

Python bindings for llama.cpp

I have a discord bot set up to interface with OpenAI's API already that a small discord server uses. I'm looking to give my bot access to custom models like Vicuna or any of the LLaMA variants that came out(up to 30B, potentially even 65B). The most obvious solution would be setting something like https://github.com/abetlen/llama-cpp-python up on a cloud instance, and serving from FastAPI.
text-generation-webui

876 35,583 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

You can find more about the extension here: https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
gpt-discord-bot

7 1,703 4.2 Python

Example Discord bot written in Python that uses the completions API to have conversations with the `text-davinci-003` model, and the moderations API to filter the messages.

And here's a Discord bot that currently works with it that you may be able to learn from: https://github.com/openai/gpt-discord-bot

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: Speeding up LLM inference 2x times (possibly)
3 projects | news.ycombinator.com | 17 Apr 2024
OpenAI Security Bots
1 project | news.ycombinator.com | 18 Apr 2024
Recapping the AI, Machine Learning and Data Science Meetup — April 18, 2024
1 project | dev.to | 18 Apr 2024
Mark Zuckerberg: Llama 3, $10B Models, Caesar Augustus, Bioweapons [video]
2 projects | news.ycombinator.com | 18 Apr 2024
Ajenti is a Linux and BSD modular server admin panel
1 project | news.ycombinator.com | 18 Apr 2024

Most efficient way to set up API serving of custom LLMs?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA Post date: 15 May 2023

llama-cpp-python

text-generation-webui

WorkOS

gpt-discord-bot

Related posts