Llama-CPU: Fork of Facebooks LLaMa model to run on CPU

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

llama

184 53,053 8.1 Python

Inference code for Llama models

Reading the patch: https://github.com/facebookresearch/llama/compare/main...mar...
Looks like this is just tweaking some defaults and commenting out some code that enables cuda. It also switches to something called gloo, which I'm not familiar with. Seems like an alternate backend.

llama-cpu

9 775 3.1 Python

Fork of Facebooks LLaMa model to run on CPU
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
llama-mps

4 83 3.8 Python

Experimental fork of Facebooks LLaMa model which runs it with GPU acceleration on Apple Silicon M1/M2
llama

3 35 1.6

Inference code for LLaMA models (by gmorenz)

I don't know about this fork specifically, but in general yes absolutely.
Even without enough ram, you can stream model weights from disk and run at [size of model/disk read speed] seconds per token.
I'm doing that on a small GPU with this code, but it should be easy to get this working with the CPU as compute instead (and at least with my disk/CPU, I'm not even sure that it would run even slower, I think disk read would probably still be the bottleneck)
https://github.com/gmorenz/llama/tree/ssd

KoboldAI-Client

185 3,344 6.3 Python

I have been using similar models like LLM for helping draft fictional stories. The community fine tuned models are geared towards SFW or NSFW story competition.
https://github.com/KoboldAI/KoboldAI-Client To read more about current popular models.
https://koboldai.net/ is a way to run some of these models in the "cloud". There's no account required and the prompts are run on other people's hardware, with priority weighting based on how much compute you have used or donated. There's an anonymous api key and there's no expectation that the output can't be logged.
The models that run on hardware locally are very basic in the quality of output. Here's an example of a 6B output used to try to emulate chatgpt. https://mobile.twitter.com/Knaikk/status/1629711223863345154 The model was finetuned on story completion so it's not meaningfully comparable.

transformers

175 125,021 10.0 Python

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

I tried mark's OMP_NUM_THREADS suggestion (https://news.ycombinator.com/item?id=35018559), did not see any an obvious change to make it parallel, and given the huggingface patch (https://github.com/huggingface/transformers/pull/21955) once it gets in should allow streaming from RAM to the GPU felt it was not worth the effort to keep working on the CPU version as even a ~30X speedup using all the cores will still take around a minute to run the 7B.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Schedule-Free Learning – A New Way to Train
3 projects | news.ycombinator.com | 6 Apr 2024
HuggingFace Transformers: Qwen2
1 project | news.ycombinator.com | 11 Jan 2024
HuggingFace Transformers Release v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2
1 project | news.ycombinator.com | 13 Dec 2023
HuggingFace: Support for the Mixtral Moe
1 project | news.ycombinator.com | 11 Dec 2023
Paris-Based Startup and OpenAI Competitor Mistral AI Valued at $2B
4 projects | news.ycombinator.com | 10 Dec 2023

Llama-CPU: Fork of Facebooks LLaMa model to run on CPU

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
NLP Natural Language Processing Pytorch language-model Tensorflow
Post date: 7 Mar 2023

llama

llama-cpu

InfluxDB

llama-mps

llama

KoboldAI-Client

transformers

Related posts

Llama-CPU: Fork of Facebooks LLaMa model to run on CPU

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com NLP Natural Language Processing Pytorch language-model Tensorflow Post date: 7 Mar 2023

llama

llama-cpu

InfluxDB

llama-mps

llama

KoboldAI-Client

transformers

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
NLP Natural Language Processing Pytorch language-model Tensorflow
Post date: 7 Mar 2023