llama-mps
LLaMA_MPS
llama-mps | LLaMA_MPS | |
---|---|---|
4 | 4 | |
83 | 566 | |
- | - | |
3.8 | 10.0 | |
9 months ago | about 1 year ago | |
Python | Python | |
GNU General Public License v3.0 only | GPL-3.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llama-mps
-
llama.cpp now officially supports GPU acceleration.
There are currently at least 3 ways to run llama on m1 with GPU acceleration. - mlc-llm (pre-built, only 1 model has been ported) - tinygrad (very memory efficient, not that easy to integrate into other projects) - llama-mps (original llama codebase + llama adapter support)
-
LLaMA-7B in Pure C++ with full Apple Silicon support
There is also a gpu-acelerated fork of the original repo
https://github.com/remixer-dec/llama-mps
- Llama-CPU: Fork of Facebooks LLaMa model to run on CPU
-
[D] Tutorial: Run LLaMA on 8gb vram on windows (thanks to bitsandbytes 8bit quantization)
I tried to port the llama-cpu version to a gpu-accelerated mps version for macs, it runs, but the outputs are not as good as expected and it often gives "-1" tokens. Any help and contributions on fixing it are welcome!
LLaMA_MPS
-
A brief history of LLaMA models
Most places that recommend llama.cpp for mac fail to mention https://github.com/jankais3r/LLaMA_MPS, which runs unquantized 7b and 13b models on the M1/M2 GPU directly. It's slightly slower, (not a lot), and significantly lower energy usage. To me the win not having to quantize is huge; I wish more people knew about it.
-
Databricks Releases 15K Record Training Corpus for Instruction Tuning LLMs
I saw this: https://github.com/jankais3r/LLaMA_MPS
it runs slightly slower on the GPU than under llama.cpp but uses much less power doing so
I would guess the slowness is due to immaturity of the PyTorch MPS backend, the asitop graphs show it doing a bunch of cpu along with the gpu, so it might be inefficiently falling back to cpu for some ops and swapping layers back and forth (I have no idea, just guessing)
-
Apples effort on developing Chat GPT like functions?
Not chatgpt, but also nothing to sneeze at. https://github.com/jankais3r/LLaMA_MPS 7B llm on 32gb m1 pro.
-
llama VS LLaMA_MPS - a user suggested alternative
2 projects | 10 Mar 2023
What are some alternatives?
llama - Inference code for Llama models
m1xxx - Unofficial native Mixxx builds for macOS (Apple Silicon/Intel) and Linux
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
awesome-ml - Curated list of useful LLM / Analytics / Datascience resources
RedPajama-Data - The RedPajama-Data repository contains code for preparing large datasets for training large language models.
llama - Inference code for LLaMA models
vanilla-llama - Plain pytorch implementation of LLaMA
tinygrad - You like pytorch? You like micrograd? You love tinygrad! ❤️
Multi-Modality-Arena - Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
llama-dl - High-speed download of LLaMA, Facebook's 65B parameter GPT model [UnavailableForLegalReasons - Repository access blocked]
llama-dfdx - LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!