llama
tinygrad
llama | tinygrad | |
---|---|---|
1 | 58 | |
189 | 17,800 | |
- | - | |
2.8 | 9.7 | |
about 1 year ago | 10 months ago | |
Python | Python | |
GNU General Public License v3.0 only | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llama
-
LLaMA-7B in Pure C++ with full Apple Silicon support
Repetition penalty is a matter of, generate a token, then multiply that logit by the penalty. (If the logit is negative, divide instead of multiply.)
https://github.com/shawwn/llama has an implementation (check the commit history).
tinygrad
- tinygrad: extreme simplicity, easiest framework to add new accelerators to
-
GGML – AI at the Edge
Might be a silly question but is GGML a similar/competing library to George Hotz's tinygrad [0]?
[0] https://github.com/geohot/tinygrad
-
Render neural network into CUDA/HIP code
at first glance i thought may its like tinygrad. but looks has many ops than that tiny grad but most maps to underlying hardware provided ops?
i wonder how well tinygrad's apporach will work out, ops fusion sounds easy, just a walk a graph, pattern match it and lower to hardware provided ops?
Anyway if anyone wants to understand the philosophy behind tinygrad, this file is great start https://github.com/geohot/tinygrad/blob/master/docs/abstract...
-
llama.cpp now officially supports GPU acceleration.
There are currently at least 3 ways to run llama on m1 with GPU acceleration. - mlc-llm (pre-built, only 1 model has been ported) - tinygrad (very memory efficient, not that easy to integrate into other projects) - llama-mps (original llama codebase + llama adapter support)
- George Hotz building an AMD competitor to Nvidia.
-
George Hotz ROCm adventures
Hopefully we will see now full support with AMD hardware on https://github.com/geohot/tinygrad. You can read more about it on https://tinygrad.org/
-
The Coming of Local LLMs
tinygrad
https://github.com/geohot/tinygrad/tree/master/accel/ane
But I have not tested it on Linux since Asahi has not yet added support.
llama.cpp runs at 18ms per token (7B) and 200ms per token (65B) without quantization.
- Everything we know about Apple's Neural Engine
- Everything we know about the Apple Neural Engine (ANE)
- How 'Open' Is OpenAI, Really?
What are some alternatives?
llama-dl - High-speed download of LLaMA, Facebook's 65B parameter GPT model [UnavailableForLegalReasons - Repository access blocked]
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
amx - Apple AMX Instruction Set
llama.cpp - LLM inference in C/C++
whisper.cpp - Port of OpenAI's Whisper model in C/C++
openpilot - openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for 250+ supported car makes and models.
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
llama - Inference code for Llama models
llama-mps - Experimental fork of Facebooks LLaMa model which runs it with GPU acceleration on Apple Silicon M1/M2
tensorflow_macos - TensorFlow for macOS 11.0+ accelerated using Apple's ML Compute framework.
llama - Inference code for LLaMA models
GPTQ-for-LLaMa - 4 bits quantization of LLaMA using GPTQ