SaaSHub helps you find the best software and product alternatives Learn more →
Top 13 C++ ggml Projects
-
> The benchmark prompt was:
> Write a compact Python function that parses a unified diff and returns the changed file paths. Then explain two edge cases.
> Each benchmark generated about 128 tokens.
Generating 128 tokens is probably not enough for good benchmark results. MTP speedup depends on how often the predicted tokens are accepted. In my experience, the very early output has a higher acceptance rate, so short testing can give false positive speedups.
Also llama.cpp includes a tool specifically for benchmarking:
https://github.com/ggml-org/llama.cpp/blob/master/tools/llam...
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
https://github.com/LostRuins/koboldcpp Download models at HuggingFace and run them locally. No logins, no spying, no hidden data harvesting.
-
Project mention: Show HN: Gosd: High-performance Stable Diffusion inference in pure Go(no CGO) | news.ycombinator.com | 2026-05-07
https://github.com/leejet/stable-diffusion.cpp for the full list of compatible models.
On my local setup with Radeon 7900XTX, a full HD image can be generated in about 10-30sec.
-
-
-
-
-
beellama.cpp
DFlash & TurboQuant in llama.cpp with up to 3x faster generation and 7.5x more KV cache in same VRAM
Project mention: KVarN: Native vLLM KV-cache quantization back end by Huawei | news.ycombinator.com | 2026-06-04 -
-
booster
Booster - open accelerator for LLM models. Better inference and debugging for AI hackers (by gotzmann)
-
LangCommand
LangCommand is a local inference command-line tool that transforms natural language descriptions into shell commands.
-
CrispASR
C++ ggml runtime hub for multilingual ASR models: Cohere Transcribe, Parakeet TDT, Voxtral, Canary 1B v2, etc, plus universal forced alignment via NeMo Forced Aligner-style CTC, and others. Fork of whisper.cpp.
Project mention: Microsoft VibeVoice: Open-Source Frontier Voice AI | news.ycombinator.com | 2026-04-28 -
I've been using nemotron ASR with my own ported inference, and happy about it:
https://huggingface.co/nvidia/nemotron-speech-streaming-en-0...
https://github.com/m1el/nemotron-asr.cpp
C++ ggml discussion
C++ ggml related posts
-
8GB to 70B: A Real Hardware Guide for Local LLMs
-
Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 80.2 tok/s)
-
New `llama.cpp` Updates, AI Agents for Any LLM, and Quantized Vector Index for Local Inference
-
Run Gemma-4 12B on WSL2 with llama.cpp
-
Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency
-
How fast is LlamaStash? Overhead, throughput, and a fair comparison with Ollama and LM Studio
-
A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU)
-
A note from our sponsor - SaaSHub
www.saashub.com | 13 Jun 2026
Index
What are some of the best open-source ggml projects in C++? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | llama.cpp | 115,929 |
| 2 | koboldcpp | 10,754 |
| 3 | stable-diffusion.cpp | 6,245 |
| 4 | rwkv.cpp | 1,562 |
| 5 | bark.cpp | 859 |
| 6 | minigpt4.cpp | 570 |
| 7 | clip.cpp | 560 |
| 8 | beellama.cpp | 363 |
| 9 | vit.cpp | 313 |
| 10 | booster | 168 |
| 11 | LangCommand | 119 |
| 12 | CrispASR | 67 |
| 13 | nemotron-asr.cpp | 18 |