Ask HN: Cheapest hardware to run Llama 2 70B

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

exllama

64 2,609 9.0 Python

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
petals

98 8,710 8.3 Python

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

If you have a lot of money (but not H100/A100 money), get 4090s as they're currently the best bang for your buck on the CUDA side (according to George Hotz). If broke, get multiple second hand 3090s. https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni.... If unwilling to spend any money at all and just want to play around with llama70b, look into petals https://github.com/bigscience-workshop/petals

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
ollama

203 64,536 9.9 Go

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

The only info I can provide is the table I've seen on: https://github.com/jmorganca/ollama where it states one needs "32 GB to run the 13B models." I would assume you may need a GPU for this.
Related, could someone please point me in the right direction on how to run Wizard Vicuna Uncensored or Llama2 13B locally in Linux? I've been searching for a guide and have not found what I need for a beginner like myself. In the Github I referenced the download is only for Mac at the time. I have a Macbook Pro M1 I can use though it's running Debian.
Thank you.

llama.cpp

776 57,463 10.0 C++

LLM inference in C/C++

Was it from here: https://github.com/ggerganov/llama.cpp
Do you have a guide that you followed and could link it to me or was it just from prior knowledge?

llama2.rs

3 983 8.9 Rust

A fast llama2 decoder in pure Rust.

This code runs Llama2 quantized and unquantized in a roughly minimal way: https://github.com/srush/llama2.rs (though extracting the quantized 70B weights takes a lot of RAM). I'm running the 13B quantized model on ~10-11GB of CPU memory.

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Symbolica Computer Algebra System

1 project | news.ycombinator.com | 9 May 2024
I'm puzzled how anyone trusts ChatGPT for code

4 projects | news.ycombinator.com | 8 May 2024
LFG is a CLI tool that helps you find the right terminal commands for your tasks

1 project | news.ycombinator.com | 9 May 2024
LLMs following grammar for general-purpose PLs

1 project | news.ycombinator.com | 9 May 2024
Using AirPods as a Morse Transmitter

2 projects | news.ycombinator.com | 7 May 2024

Ask HN: Cheapest hardware to run Llama 2 70B

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Post date: 9 Aug 2023

exllama

petals

InfluxDB

ollama

llama.cpp

llama2.rs

SaaSHub

Related posts

Symbolica Computer Algebra System

I'm puzzled how anyone trusts ChatGPT for code

LFG is a CLI tool that helps you find the right terminal commands for your tasks

LLMs following grammar for general-purpose PLs

Using AirPods as a Morse Transmitter