Now that ExLlama is out with reduced VRAM usage, are there any GPTQ models bigger than 7b which can fit onto an 8GB card?

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

exllama

64 2,594 9.0 Python

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

exllama is an optimized implementation of GPTQ-for-LLaMa, allowing you to run 4-bit quantized language models with GPU at great speeds.

GPTQ-for-LLaMa

75 2,916 8.6 Python

4 bits quantization of LLaMA using GPTQ

exllama is an optimized implementation of GPTQ-for-LLaMa, allowing you to run 4-bit quantized language models with GPU at great speeds.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

OPA, Cedar, OpenFGA: Why are Policy Languages Trending Right Now?

1 project | dev.to | 2 May 2024
Ask HN: Why not more fuse filesystems?

1 project | news.ycombinator.com | 2 May 2024
OneTwo: LangChain Replacement from Google DeepMind

1 project | news.ycombinator.com | 2 May 2024
LangFun: Object oriented data programs using LLMs

1 project | news.ycombinator.com | 2 May 2024
A smooth and sharp image interpolation you probably haven't heard of

2 projects | news.ycombinator.com | 2 May 2024

Now that ExLlama is out with reduced VRAM usage, are there any GPTQ models bigger than 7b which can fit onto an 8GB card?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA Post date: 29 Jun 2023

exllama

GPTQ-for-LLaMa

InfluxDB

Related posts

OPA, Cedar, OpenFGA: Why are Policy Languages Trending Right Now?

Ask HN: Why not more fuse filesystems?

OneTwo: LangChain Replacement from Google DeepMind

LangFun: Object oriented data programs using LLMs

A smooth and sharp image interpolation you probably haven't heard of