How to run Pygmalion on 4.5GB of VRAM with full context size.

This page summarizes the projects mentioned and recommended in the original post on /r/PygmalionAI

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • Get rid of everything you have and just use the installer I linked HERE. It'll get everything for you, up to date.

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • GPTQ-for-LLaMa

    4 bits quantization of LLaMa using GPTQ (by oobabooga)

  • I'm just throwing it out there, but I had to use oobabooga's fork of GPTQ-for-Llama and ensuring I was on the cuda branch (https://github.com/oobabooga/GPTQ-for-LLaMa.git).

  • GPTQ-for-LLaMa

    4 bits quantization of LLMs using GPTQ (by YellowRoseCx)

  • I can load it with GPTQ on a 6700xt using a fork: https://github.com/YellowRoseCx/GPTQ-for-LLaMa You can update the post for AMD users if you want.

  • bitsandbytes-rocm

  • There are a lot of ROCm versions of bitsandbytes. For example this one: https://github.com/broncotc/bitsandbytes-rocm The problem is compatibility with most of the requirements. Kobold does a better job than ooba in offering a more streamlined approach for AMD users.

  • bitsandbytes

    Accessible large language models via k-bit quantization for PyTorch.

  • Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts