Anyone actually running 30b/65b at reasonably high speed? What's your rig?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

llama.cpp

1 1 9.4 C

Port of Facebook's LLaMA model in C/C++ (by sw)

It might be worth trying out 2/3 bit quantization on llama.cpp. Currently sitting in an unmerged pr, but it works. I doubt you’ll be getting 5+ tokens/second though. link

GPTQ-for-LLaMa

19 129 7.7 Python

4 bits quantization of LLaMa using GPTQ (by oobabooga)

I'm on GPTQ for LLaMA folder under repositories says it's pointed at https://github.com/oobabooga/GPTQ-for-LLaMa.git. But I've run through the instructions and also applied the monkey patch to train and apply 4 bit lora which may come into play. No idea.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: Free GitHub Copilot CLI with your own model or API
1 project | news.ycombinator.com | 27 Apr 2024
Einsum in 40 Lines of Python
5 projects | news.ycombinator.com | 27 Apr 2024
Show HN: Cognita – open-source RAG framework for modular applications
2 projects | news.ycombinator.com | 27 Apr 2024
Show HN: Data Bonsai: a Python package to clean your data with LLMs
1 project | news.ycombinator.com | 27 Apr 2024
Ask HN: Seeking On-Premises Website Examples for Uptime Comparison Experiment
2 projects | news.ycombinator.com | 27 Apr 2024

Anyone actually running 30b/65b at reasonably high speed? What's your rig?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA Post date: 3 May 2023

llama.cpp

GPTQ-for-LLaMa

InfluxDB

Related posts