[D] Is it possible to run Meta's LLaMA 65B model on consumer-grade hardware?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

llama-int8

6 1,044 3.6 Python

Quantized inference code for LLaMA models
text-g

1 - -

https://github.com/oobabooga/text-g eneration-webui/issues/147#issuecom ment-1454798725

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
llama

184 53,053 8.1 Python

Inference code for Llama models
FlexGen

39 8,999 3.0 Python

Running large language models on a single GPU for throughput-oriented scenarios.

With flexgen I believe it should be possible to run on a typical high end system. They have run a 175B parameter model with it. See here: https://github.com/FMInference/FlexGen

wrapyfi-examples_llama

2 128 4.0 Python

Inference code for facebook LLaMA models with Wrapyfi support
transformers

175 125,021 10.0 Python

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
text-generation-webui

876 36,293 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

See here for full details: https://github.com/oobabooga/text-generation-webui/issues/147

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
llama-cpu

9 775 3.1 Python

Fork of Facebooks LLaMa model to run on CPU

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project