wrapyfi-examples_llama
Inference code for facebook LLaMA models with Wrapyfi support (by modular-ml)
llama-cpu
Fork of Facebooks LLaMa model to run on CPU (by markasoftware)
wrapyfi-examples_llama | llama-cpu | |
---|---|---|
2 | 9 | |
128 | 775 | |
0.0% | - | |
4.0 | 3.1 | |
about 1 year ago | about 1 year ago | |
Python | Python | |
GNU General Public License v3.0 only | GNU General Public License v3.0 only |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
wrapyfi-examples_llama
Posts with mentions or reviews of wrapyfi-examples_llama.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-03-04.
- [D] Is it possible to run Meta's LLaMA 65B model on consumer-grade hardware?
-
Wrapyfi for distributing LLaMA by Meta on different machines
The authors present an example of combining Wrapyfi (https://github.com/fabawi/wrapyfi), a Python wrapper for message-oriented and robotics middleware, with LLaMA (https://github.com/facebookresearch/llama), a series of large language models from Meta AI. They demonstrate how Wrapyfi can enable running LLaMA on multiple mid-range machines with high inference speed and low cost. They also provide links to their GitHub repository (https://github.com/modular-ml/wrapyfi-examples_llama) and paper (https://arxiv.org/abs/2302.09648) for more details. They state that this example can revolutionize natural language processing tasks such as text generation, summarization, question answering, sentiment analysis, etc. without having to buy new hardware and use their existing infrastructure!
llama-cpu
Posts with mentions or reviews of llama-cpu.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-03-08.
-
Why is ChatGPT 3.5 API 10x cheaper than GPT3?
You've probably heard, but LLaMA just released, and its 13B parameter model outperforms GPT-3 on most metrics (because they trained it on a lot more data). Someone's already quantized it to 4 and 3 bits and it performs virtually the same. It also apparently performs well on CPUs (several words per second on a 7900X). Running something equivalent to GPT3.5 on a phone is not out that far out.
- Fork of Facebook’s LLaMa model to run on CPU
- Llama-CPU: Fork of Facebooks LLaMa model to run on CPU
-
[D] Tutorial: Run LLaMA on 8gb vram on windows (thanks to bitsandbytes 8bit quantization)
I tried to port the llama-cpu version to a gpu-accelerated mps version for macs, it runs, but the outputs are not as good as expected and it often gives "-1" tokens. Any help and contributions on fixing it are welcome!
-
Facebook LLAMA is being openly distributed via torrents | Hacker News
You can run it with only a CPU and 32 gigs of RAM: https://github.com/markasoftware/llama-cpu
- [D] Is it possible to run Meta's LLaMA 65B model on consumer-grade hardware?
-
Facebook LLAMA is being openly distributed via torrents
I was able to run 7B on a CPU, inferring several words per second: https://github.com/markasoftware/llama-cpu
What are some alternatives?
When comparing wrapyfi-examples_llama and llama-cpu you can also consider the following projects:
llama - Inference code for Llama models
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
text-g
GPTQ-for-LLaMa - 4 bits quantization of LLaMA using GPTQ
llama-int8 - Quantized inference code for LLaMA models
bitsandbytes-win-prebuilt
FlexGen - Running large language models on a single GPU for throughput-oriented scenarios.
KoboldAI-Client
DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
wrapyfi-examples_llama vs llama
llama-cpu vs text-generation-webui
wrapyfi-examples_llama vs transformers
llama-cpu vs llama
wrapyfi-examples_llama vs text-g
llama-cpu vs GPTQ-for-LLaMa
wrapyfi-examples_llama vs llama-int8
llama-cpu vs bitsandbytes-win-prebuilt
llama-cpu vs transformers
llama-cpu vs FlexGen
llama-cpu vs KoboldAI-Client
llama-cpu vs DeepSpeed