C++ ggml

Open-source C++ projects categorized as ggml

Top 13 C++ ggml Projects

  1. llama.cpp

    LLM inference in C/C++

    Project mention: How to Setup a Local Coding Agent on macOS | news.ycombinator.com | 2026-06-12

    > The benchmark prompt was:

    > Write a compact Python function that parses a unified diff and returns the changed file paths. Then explain two edge cases.

    > Each benchmark generated about 128 tokens.

    Generating 128 tokens is probably not enough for good benchmark results. MTP speedup depends on how often the predicted tokens are accepted. In my experience, the very early output has a higher acceptance rate, so short testing can give false positive speedups.

    Also llama.cpp includes a tool specifically for benchmarking:

    https://github.com/ggml-org/llama.cpp/blob/master/tools/llam...

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. koboldcpp

    Run GGUF models easily with a KoboldAI UI. One File. Zero Install.

    Project mention: Best Free AI Chatbots Without Login (over TOR and Anonymous) | dev.to | 2025-10-07

    https://github.com/LostRuins/koboldcpp Download models at HuggingFace and run them locally. No logins, no spying, no hidden data harvesting.

  4. stable-diffusion.cpp

    Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++

    Project mention: Show HN: Gosd: High-performance Stable Diffusion inference in pure Go(no CGO) | news.ycombinator.com | 2026-05-07

    https://github.com/leejet/stable-diffusion.cpp for the full list of compatible models.

    On my local setup with Radeon 7900XTX, a full HD image can be generated in about 10-30sec.

  5. rwkv.cpp

    INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

  6. bark.cpp

    Suno AI's Bark model in C/C++ for fast text-to-speech generation

  7. minigpt4.cpp

    Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)

  8. clip.cpp

    CLIP inference in plain C/C++ with no extra dependencies

  9. beellama.cpp

    DFlash & TurboQuant in llama.cpp with up to 3x faster generation and 7.5x more KV cache in same VRAM

    Project mention: KVarN: Native vLLM KV-cache quantization back end by Huawei | news.ycombinator.com | 2026-06-04
  10. vit.cpp

    Inference Vision Transformer (ViT) in plain C/C++ with ggml

  11. booster

    Booster - open accelerator for LLM models. Better inference and debugging for AI hackers (by gotzmann)

  12. LangCommand

    LangCommand is a local inference command-line tool that transforms natural language descriptions into shell commands.

  13. CrispASR

    C++ ggml runtime hub for multilingual ASR models: Cohere Transcribe, Parakeet TDT, Voxtral, Canary 1B v2, etc, plus universal forced alignment via NeMo Forced Aligner-style CTC, and others. Fork of whisper.cpp.

    Project mention: Microsoft VibeVoice: Open-Source Frontier Voice AI | news.ycombinator.com | 2026-04-28
  14. nemotron-asr.cpp

    Nemotron ASR rewrite to GGML

    Project mention: Voxtral Transcribe 2 | news.ycombinator.com | 2026-02-04

    I've been using nemotron ASR with my own ported inference, and happy about it:

    https://huggingface.co/nvidia/nemotron-speech-streaming-en-0...

    https://github.com/m1el/nemotron-asr.cpp

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

C++ ggml discussion

Log in or Post with

C++ ggml related posts

  • 8GB to 70B: A Real Hardware Guide for Local LLMs

    1 project | dev.to | 12 Jun 2026
  • Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 80.2 tok/s)

    1 project | dev.to | 9 Jun 2026
  • New `llama.cpp` Updates, AI Agents for Any LLM, and Quantized Vector Index for Local Inference

    3 projects | dev.to | 8 Jun 2026
  • Run Gemma-4 12B on WSL2 with llama.cpp

    1 project | dev.to | 5 Jun 2026
  • Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency

    1 project | news.ycombinator.com | 5 Jun 2026
  • How fast is LlamaStash? Overhead, throughput, and a fair comparison with Ollama and LM Studio

    3 projects | dev.to | 2 Jun 2026
  • A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU)

    3 projects | news.ycombinator.com | 1 Jun 2026
  • A note from our sponsor - SaaSHub
    www.saashub.com | 13 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source ggml projects in C++? This list will help you:

# Project Stars
1 llama.cpp 115,929
2 koboldcpp 10,754
3 stable-diffusion.cpp 6,245
4 rwkv.cpp 1,562
5 bark.cpp 859
6 minigpt4.cpp 570
7 clip.cpp 560
8 beellama.cpp 363
9 vit.cpp 313
10 booster 168
11 LangCommand 119
12 CrispASR 67
13 nemotron-asr.cpp 18

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that C++ is
the 7th most popular programming language
based on number of references?