SaaSHub helps you find the best software and product alternatives Learn more →
Llama-cpp-turboquant Alternatives
Similar projects and alternatives to llama-cpp-turboquant
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a better llama-cpp-turboquant alternative or higher similarity.
llama-cpp-turboquant discussion
llama-cpp-turboquant reviews and mentions
Posts with mentions or reviews of llama-cpp-turboquant.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2026-04-18.
-
Building a Systemic Autonomy Agent: OpenClaw + Gemma 4 & TurboQuant on Raspberry Pi 4B
# Install dependencies sudo apt install -y git cmake build-essential libopenblas-dev libssl-dev # Clone the TurboQuant-specific branch git clone https://github.com/TheTom/llama-cpp-turboquant cd llama-cpp-turboquant git checkout feature/turboquant-kv-cache # Configure for Pi 4 (Cortex-A72 / NEON acceleration) cmake -B build \ -DCMAKE_BUILD_TYPE=Release \ -DGGML_CPU=ON \ -DGGML_NATIVE=ON \ -DGGML_OPENBLAS=ON \ -DGGML_NEON=ON \ -DLLAMA_OPENSSL=ON # Compile (This takes ~15 mins) cmake --build build --config Release -j4
- TurboQuant model weight compression support added to Llamacpp
-
Show HN: TurboQuant for vector search – 2-4 bit compression
I built TurboQuant+ (https://github.com/TheTom/llama-cpp-turboquant), the llama.cpp implementation of this paper with extensions: asymmetric K/V compression, boundary layer protection, sparse V dequant, and this week weight compression (TQ4_1S) that shrinks models 28-42%% on disk with minimal quality loss. 5k+ stars, 50+ community testers across Metal, CUDA, and AMD HIP.
Cool to see the same WHT + Lloyd-Max math applied to vector search. The data-oblivious codebook property is exactly what makes it work for online KV cache compression too. No calibration, no training, just quantize and go.
If anyone is running local LLMs and wants to try it: https://github.com/TheTom/turboquant_plus/blob/main/docs/get...
- Qwen3 512k context via TurboQuant on Mac mini
-
A note from our sponsor - SaaSHub
www.saashub.com | 6 Jun 2026
Stats
Basic llama-cpp-turboquant repo stats
4
1,755
10.0
about 16 hours ago
TheTom/llama-cpp-turboquant is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of llama-cpp-turboquant is C++.