llama-mps
llama-mps | bitsandbytes-win-prebuilt | |
---|---|---|
4 | 4 | |
83 | 76 | |
- | - | |
3.8 | 10.0 | |
9 months ago | over 1 year ago | |
Python | ||
GNU General Public License v3.0 only | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llama-mps
-
llama.cpp now officially supports GPU acceleration.
There are currently at least 3 ways to run llama on m1 with GPU acceleration. - mlc-llm (pre-built, only 1 model has been ported) - tinygrad (very memory efficient, not that easy to integrate into other projects) - llama-mps (original llama codebase + llama adapter support)
-
LLaMA-7B in Pure C++ with full Apple Silicon support
There is also a gpu-acelerated fork of the original repo
https://github.com/remixer-dec/llama-mps
- Llama-CPU: Fork of Facebooks LLaMa model to run on CPU
-
[D] Tutorial: Run LLaMA on 8gb vram on windows (thanks to bitsandbytes 8bit quantization)
I tried to port the llama-cpu version to a gpu-accelerated mps version for macs, it runs, but the outputs are not as good as expected and it often gives "-1" tokens. Any help and contributions on fixing it are welcome!
bitsandbytes-win-prebuilt
-
bitsandbytes now for Windows (8-bit CUDA functions for PyTorch)
So there used to be a compiled version from https://github.com/DeXtmL/bitsandbytes-win-prebuilt but now I see there is a new version (from last week) at https://github.com/acpopescu/bitsandbytes/releases which appears to maybe become the start of Windows support in the official repo?
-
[D] Tutorial: Run LLaMA on 8gb vram on windows (thanks to bitsandbytes 8bit quantization)
put libbitsandbytes_cuda116.dll in C:\Users\xxx\miniconda3\envs\textgen\lib\site-packages\bitsandbytes\
-
Running Pygmalion 6b with 8GB of VRAM
Download these 2 dll files from here. then you move those files into "installer_files\env\lib\site-packages\bitsandbytes\" under your oobabooga root folder (where you've extracted the oneclick installer)
- Has anyone gotten the models to load via 8-bit for windows?!?!?
What are some alternatives?
llama - Inference code for Llama models
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
llama-cpu - Fork of Facebooks LLaMa model to run on CPU
awesome-ml - Curated list of useful LLM / Analytics / Datascience resources
bitsandbytes - 8-bit CUDA functions for PyTorch
llama - Inference code for LLaMA models
LLaMA_MPS - Run LLaMA inference on Apple Silicon GPUs.
one-click-installers - Simplified installers for oobabooga/text-generation-webui.
tinygrad - You like pytorch? You like micrograd? You love tinygrad! ❤️
llama-dl - High-speed download of LLaMA, Facebook's 65B parameter GPT model [UnavailableForLegalReasons - Repository access blocked]