llama.cpp

LLM inference in C/C++ [Moved to: https://github.com/ggml-org/llama.cpp] (by ggerganov)

Llama.cpp Alternatives

Similar projects and alternatives to llama.cpp

  1. llama.cpp

    LLM inference in C/C++

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. ollama

    Get up and running with Kimi-K2.6, GLM-5.1, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

  4. prompts.chat

    f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the community. Free and open source β€” self-host for your organization with complete privacy.

  5. Auto-GPT

    108 llama.cpp VS Auto-GPT

    Discontinued An experimental open-source attempt to make GPT-4 fully autonomous. [Moved to: https://github.com/Significant-Gravitas/Auto-GPT] (by Torantulino)

  6. Prompt-Engineering-Guide

    πŸ™ Guides, papers, lessons, notebooks and resources for prompt engineering, context engineering, RAG, and AI Agents.

  7. LocalAI

    LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

  8. vllm

    91 llama.cpp VS vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

  9. langchain

    92 llama.cpp VS langchain

    The agent engineering platform.

  10. llama_index

    LlamaIndex is the leading document agent and OCR platform

  11. onnxruntime

    ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

  12. llama-cpp-python

    Python bindings for llama.cpp

  13. sentence-transformers

    State-of-the-Art Embeddings, Retrieval, and Reranking

  14. LoRA

    51 llama.cpp VS LoRA

    Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

  15. mlx

    50 llama.cpp VS mlx

    MLX: An array framework for Apple silicon

  16. aichat

    39 llama.cpp VS aichat

    All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI Tools & Agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more.

  17. n8n-docs

    Documentation for n8n, a fair-code licensed automation tool with a free community edition and powerful enterprise options. Build AI functionality into your workflows.

  18. hyperlearn

    9 llama.cpp VS hyperlearn

    2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.

  19. ProxyAI

    1 llama.cpp VS ProxyAI

    The leading open-source AI copilot for JetBrains. Connect to any model in any environment, and customize your coding experience in any way you like.

  20. LLamaSharp

    A C#/.NET library to run LLM (πŸ¦™LLaMA/LLaVA) on your local device efficiently.

  21. stable-diffusion.cpp

    Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better llama.cpp alternative or higher similarity.

llama.cpp discussion

Log in or Post with

llama.cpp reviews and mentions

Posts with mentions or reviews of llama.cpp. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2026-05-23.
  • Run Gemma-4 12B on WSL2 with llama.cpp
    1 project | dev.to | 5 Jun 2026
    git clone https://github.com/ggerganov/llama.cpp cd llama.cpp cmake -B build -DGGML_CUDA=ON -DLLAMA_OPENSSL=ON cmake --build build --config Release # no GPU git clone https://github.com/ggerganov/llama.cpp cd llama.cpp cmake -B build cmake --build build --config Release
  • I ran Claude Code on a local LLM for 4 hours β€” 7M tokens, $0 (would have cost $94)
    1 project | dev.to | 25 May 2026
    Instead, I routed it through a local llama.cpp instance running Qwen3.6-27B-MTP on my AMD GPU. The total cost: $0.
  • 둜컬 LLM μ…‹μ—… κ°€μ΄λ“œ (v27)
    1 project | dev.to | 25 May 2026
    bash # 1. κΈ°λ³Έ μ„€μΉ˜ git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make # 2. λͺ¨λΈ λ‹€μš΄λ‘œλ“œ mkdir models && cd models wget https://huggingface.co/QuantFactory/Llama3-8B-4bit/resolve/main/Llama3-8B-4bit.gguf # 3. μ„œλ²„ --- πŸ“₯ **Get the full guide on Gumroad**: https://gumroad.com/l/auto ($7)
  • 둜컬 LLM μ…‹μ—… κ°€μ΄λ“œ (v23)
    1 project | dev.to | 24 May 2026
    # μ„€μΉ˜ μ „ μ€€λΉ„ sudo apt update sudo apt install build-essential git -y # llama.cpp λ‹€μš΄λ‘œλ“œ 및 컴파일 git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp # 컴파일 make clean make # ν•„μˆ˜ 라이브러리 μ„€μΉ˜ (ν•„μš”μ‹œ) pip install torch numpy # λͺ¨λΈ λ‹€μš΄λ‘œλ“œ (μ˜ˆμ‹œ: LLaMA-2 7B) mkdir -p models wget https://huggingface.co/llamav2-7b/resolve/main/llama-2-7b.gguf -O models/llama-2-7b.gguf
  • 둜컬 LLM μ…‹μ—… κ°€μ΄λ“œ (v21)
    1 project | dev.to | 24 May 2026
    git clone https://github.com/ggerganov/llama.cpp cd llama.cpp make
  • 둜컬 LLM μ…‹μ—… κ°€μ΄λ“œ (v18)
    1 project | dev.to | 24 May 2026
    # 1. Install dependencies sudo apt update sudo apt install git cmake build-essential # 2. Clone and build llama.cpp git clone https://github.com/ggerganov/llama.cpp cd llama.cpp make # 3. Download model ollama pull llama3:8b # 4. Start server ./server -m ~/llm-models/llama3-8b-Q4_K_M.gguf \ --host 0.0.0.0 \ --port 1234 \ --threads 8 \ --ctx-size 8192 \ --n-gpu-layers 100 # 5. Test API curl http://localhost:1234/completion \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello world", "max_tokens": 10}'
  • Gemma 4 dense by default: why your local agent doesn't want the MoE
    5 projects | dev.to | 23 May 2026
    Compilation is cleaner on dense. llama.cpp, MLX, and vLLM all support both, but the dense path has had more attention. Fewer corner cases in expert routing, GQA, and KV layout interactions. If you've ever had a custom kernel mis-handle expert dispatch, you know.
  • I Built a Privacy-First Alternative to Microsoft Recall β€” Using All 3 Gemma 4 Modalities
    2 projects | dev.to | 23 May 2026
  • Model Showdown Round 2: Adding Gemma, Kimi, and 579 GB of Stubborn Optimism
    1 project | dev.to | 7 May 2026
    sudo apt install -y cmake build-essential nvidia-cuda-toolkit cd ~ && git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build -DGGML_CUDA=ON cmake --build build --config Release -j$(nproc)
  • How to Stop Drowning in Open Model Releases and Actually Run One Locally
    1 project | dev.to | 1 May 2026
    # For GGUF models (most common quantized format) # llama.cpp is still the gold standard for CPU + GPU inference git clone https://github.com/ggerganov/llama.cpp cd llama.cpp # Build with CUDA support (adjust for your GPU) cmake -B build -DGGML_CUDA=ON cmake --build build --config Release -j $(nproc) # Quick sanity check β€” run the server ./build/bin/llama-server \ -m /path/to/your/model.gguf \ --host 0.0.0.0 \ --port 8080 \ -ngl 99 # offload all layers to GPU
  • A note from our sponsor - SaaSHub
    www.saashub.com | 6 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more β†’

Stats

Basic llama.cpp repo stats
41
75,885
-
over 1 year ago

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that C++ is
the 7th most popular programming language
based on number of references?