Running Multiple AI Models Sequentially for a Conversation on a Single GPU

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

exllama

64 2,582 9.0 Python

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

The next best thing or if you really need two different models would be to run one on the GPU and the other on CPU. For example run one model with exllama on the GPU and the other with llama.cpp in CPU mode.

text-generation-webui

876 36,293 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

Pre-Promt injection. Basically you add context to the input before you pass it to the LLM. There is a character_bias plugin for text-generation-webui that does this.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
KoboldAI-Client

185 3,344 6.3 Python

And finally the folks from the KoboldAi do some interesting stuff with Pseudocode and Soft-Prompts that might also be relevant.

character-card-spec-v2

1 60 6.4

An updated specification for AI character cards.

Character Card specification v2 https://github.com/malfoyslastname/character-card-spec-v2

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Understanding and avoiding visually ambiguous characters in IDs
6 projects | news.ycombinator.com | 22 Apr 2024
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computers
1 project | news.ycombinator.com | 28 Apr 2024
Demystifying OS Concepts: Introducing OSViz
1 project | dev.to | 28 Apr 2024
PySheets – Spreadsheet UI for Python
1 project | news.ycombinator.com | 28 Apr 2024
Building a Workout Planner App with Lyzr SDK
1 project | dev.to | 28 Apr 2024

Running Multiple AI Models Sequentially for a Conversation on a Single GPU

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA Post date: 29 Jun 2023

exllama

text-generation-webui

InfluxDB

KoboldAI-Client

character-card-spec-v2

Related posts