llama2.rs
euclid
llama2.rs | euclid | |
---|---|---|
3 | 1 | |
981 | 456 | |
- | 1.3% | |
8.9 | 6.0 | |
6 months ago | 20 days ago | |
Rust | Rust | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llama2.rs
-
Ask HN: Cheapest hardware to run Llama 2 70B
This code runs Llama2 quantized and unquantized in a roughly minimal way: https://github.com/srush/llama2.rs (though extracting the quantized 70B weights takes a lot of RAM). I'm running the 13B quantized model on ~10-11GB of CPU memory.
-
Candle: Torch Replacement in Rust
Nowhere near as neat as candle or ggml, but just released a 4-bit rust llama2 implementation with simd. Runs pretty fast.
https://github.com/srush/llama2.rs/
- Llama2.rs: One-file Rust implementation of Llama2
euclid
-
Candle: Torch Replacement in Rust
I don't do anything related to data science, but I feel like doing it in Rust would be nice.
You get operator overloading, so you can have ergonomic matrix operations that are typed also. Processing data on the CPU is fast, and crates like https://github.com/EmbarkStudios/rust-gpu make it very ergonomic to leverage the GPU.
I like this library for creating typed coordinate spaces for graphics programming (https://github.com/servo/euclid), I imagine something similar could be done to create refined types for matrices so you don't do matrix multiplication matrices of invalid sizes
What are some alternatives?
burn - Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals.
candle - Minimalist ML framework for Rust
rust - Empowering everyone to build reliable and efficient software.
exllama - A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
burn - Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals. [Moved to: https://github.com/Tracel-AI/burn]
llama.cpp - LLM inference in C/C++
petals - 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
syntaxdot - Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.
tch-rs - Rust bindings for the C++ api of PyTorch.
dfdx - Deep learning in Rust, with shape checked tensors and neural networks