transformers VS llama.cpp

Compare transformers vs llama.cpp and see what are their differences.

transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training. (by huggingface)

llama.cpp

LLM inference in C/C++ (by ggml-org)
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com
featured
Sevalla - Deploy and host your apps and databases, now with $50 credit!
Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
sevalla.com
featured
transformers llama.cpp
220 921
148,940 85,794
1.0% 2.7%
10.0 10.0
3 days ago 2 days ago
Python C++
Apache License 2.0 MIT License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

transformers

Posts with mentions or reviews of transformers. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2025-08-05.
  • Transformers 4.55 New OpenAI GPT OSS
    1 project | news.ycombinator.com | 5 Aug 2025
  • OpenAI Harmony
    3 projects | news.ycombinator.com | 5 Aug 2025
    The new transformers release describes the model: https://github.com/huggingface/transformers/releases/tag/v4....

    > GPT OSS is a hugely anticipated open-weights release by OpenAI, designed for powerful reasoning, agentic tasks, and versatile developer use cases. It comprises two models: a big one with 117B parameters (gpt-oss-120b), and a smaller one with 21B parameters (gpt-oss-20b). Both are mixture-of-experts (MoEs) and use a 4-bit quantization scheme (MXFP4), enabling fast inference (thanks to fewer active parameters, see details below) while keeping resource usage low. The large model fits on a single H100 GPU, while the small one runs within 16GB of memory and is perfect for consumer hardware and on-device applications.

  • How to Install Devstral Small 1.1 Locally?
    2 projects | dev.to | 12 Jul 2025
    pip install torch pip install git+https://github.com/huggingface/transformers pip install git+https://github.com/huggingface/accelerate pip install huggingface_hub pip install --upgrade vllm pip install --upgrade mistral_common chal
  • How to Install DeepSeek Nano-VLLM Locally?
    2 projects | dev.to | 24 Jun 2025
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 pip install git+https://github.com/huggingface/transformers pip install git+https://github.com/huggingface/accelerate pip install huggingface_hub
  • Medical RAG Research with txtai
    4 projects | dev.to | 23 Jun 2025
    Substitute your own embeddings database to change the knowledge base. txtai supports running local LLMs via transformers or llama.cpp. It also supports a wide variety of LLMs via LiteLLM. For example, setting the 2nd RAG pipeline parameter below to gpt-4o along with the appropriate environment variables with access keys switches to a hosted LLM. See this documentation page for more on this.
  • What Are Vision-Language Models (VLMs) and How Do They Work?
    3 projects | dev.to | 17 Jun 2025
  • I have reimplemented Stable Diffusion 3.5 from scratch in pure PyTorch
    8 projects | news.ycombinator.com | 14 Jun 2025
    Reference implementations are unmaintained and buggy.

    For example https://github.com/huggingface/transformers/issues/27961 OpenAI's tokenizer for CLIP is buggy, it's a reference implementation, it isn't the one they used for training, and the problems with it go unsolved and get copied endlessly by other projects.

    What about Flux? They don't say it was used for training, it wasn't, there are bugs with it that break cudagraphs or similar that aren't that impactful. On the other hand, it uses CLIP, and CLIP is buggy, so this is buggy...

  • HuggingFace transformers will focus on PyTorch, deprecating TensorFlow and Flax
    1 project | news.ycombinator.com | 13 Jun 2025
  • None of the top 10 projects in GitHub is actually a software project 🤯
    6 projects | dev.to | 10 May 2025
    We see an addition to the AI community with AutoGPT. Along with Tensorflow they represent the AI community in the software category, which is getting relevant (2 out of 8). We can expect in the future to have new AI projects in the top 25 such as Transformers or Ollama (currently top 34 and 36, respectively).
  • How to Install Foundation-Sec 8B by Cisco: The Ultimate Cybersecurity AI Model
    1 project | dev.to | 6 May 2025
    pip install torch pip install git+https://github.com/huggingface/transformers pip install git+https://github.com/huggingface/accelerate pip install huggingface_hub

llama.cpp

Posts with mentions or reviews of llama.cpp. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2025-08-21.
  • DeepSeek-v3.1 Release
    5 projects | news.ycombinator.com | 21 Aug 2025
    I maintain a cross-platform llama.cpp client - you're right to point out that generally we expect nuking logits can take care of it.

    There is a substantial performance cost to nuking, the open source internals discussion may have glossed over that for clarity (see github.com/llama.cpp/... below). The cost is very high, default in API* is not artificially lower other logits, and only do that if the first inference attempt yields a token invalid in the compiled grammar.

    Similarly, I was hoping to be on target w/r/t to what strict mode is in an API, and am sort of describing the "outer loop" of sampling

    * blissfully, you do not have to implement it manually anymore - it is a parameter in the sampling params member of the inference params

    * "the grammar constraints applied on the full vocabulary can be very taxing. To improve performance, the grammar can be applied only to the sampled token..and nd only if the token doesn't fit the grammar, the grammar constraints are applied to the full vocabulary and the token is resampled." https://github.com/ggml-org/llama.cpp/blob/54a241f505d515d62...

  • Guide: Running GPT-OSS with Llama.cpp
    1 project | news.ycombinator.com | 21 Aug 2025
  • Ollama and gguf
    10 projects | news.ycombinator.com | 11 Aug 2025
    ik_llama.cpp is another fork of llama.cpp. I followed the development of GLM4.5 support in both projects.

    The ik_llama.cpp developers had a working implementation earlier than llama.cpp, but their GGUFs were not compatible with the mainline.

    After the changes in llama.cpp were merged into master, ik_llama.cpp reworked their implementation and ported it to align with upstream: https://github.com/ggml-org/llama.cpp/pull/14939#issuecommen...

    >Many thanks to @sammcj, @CISC, and everyone who contributed! The code has been successfully ported and merged into ik_llama.

    This is how it should be done.

  • How to Install & Run GPT-OSS 20b and 120b GGUF Locally?
    1 project | dev.to | 11 Aug 2025
    apt-get update apt-get install -y pciutils build-essential cmake curl libcurl4-openssl-dev git git clone https://github.com/ggml-org/llama.cpp cmake llama.cpp -B llama.cpp/build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-server cp llama.cpp/build/bin/llama-* llama.cpp/
  • Mistral Integration Improved in Llama.cpp
    4 projects | news.ycombinator.com | 11 Aug 2025
    llama.cpp still doesn't support gpt-oss tool calling. https://github.com/ggml-org/llama.cpp/pull/15158 (among other similar PRs)

    But I also couldn't get vllm, or transformers serve, or ollama (400 response on /v1/chat/completions) working today with gpt-oss. OpenAI's cookbooks aren't really copy paste instructions. They probably tested on a single platform with preinstalled python packages which they forgot to mention :))

  • How Attention Sinks Keep Language Models Stable
    3 projects | news.ycombinator.com | 8 Aug 2025
    There was skepticism last time this was posted https://news.ycombinator.com/item?id=37740932

    Implementation for gpt-oss this week showed 2-3x improvements https://github.com/ggml-org/llama.cpp/pull/15157 https://www.reddit.com/r/LocalLLaMA/comments/1mkowrw/llamacp...

  • OpenAI Open Models
    15 projects | news.ycombinator.com | 5 Aug 2025
    Holy smokes, there's already llama.cpp support:

    https://github.com/ggml-org/llama.cpp/pull/15091

  • Llama.cpp: Add GPT-OSS
    1 project | news.ycombinator.com | 5 Aug 2025
  • My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)
    11 projects | news.ycombinator.com | 29 Jul 2025
    MLX does have good software support. Targeting both iOS and mac is a big win in itself.

    I wonder what's possible, what the software situation is today with the PC NPU's. AMD's XDNA has been around for a while, XDNA2 jumps from 10->40 TOps. The "AMDXDNA" driver merged in 6.14 last winter: where are we now?

    But not seeing any evidence that there's popular support in any of the main frameworks. https://github.com/ggml-org/llama.cpp/issues/1499 https://github.com/ollama/ollama/issues/5186

    Good news, AMD has an initial implementation of llama.cpp. I don't particularly know what it means, but the firt gen supports W4ABF16 quantization, newer chips support W8A16. https://github.com/ggml-org/llama.cpp/issues/14377 . I'm not sure what it's good for, but there is a Linux "xdna-driveR", https://github.com/amd/xdna-driver

    Would also be interesting to know how this compares to say the huge iGPU on Strix Halo. I don't know these NPUs similarly work with very large models.

    There's a lot of other folks also starting on their NPU journeys. ARM's Ethos, and Rockchip's RKNN recently shipped Linux kernel drivers, but it feels like that's just a start? https://www.phoronix.com/news/Arm-Ethos-NPU-Accel-Driver https://www.phoronix.com/news/Rockchip-NPU-Driver-RKNN-2025

  • AMD teams contributing to the llama.cpp codebase
    1 project | news.ycombinator.com | 28 Jul 2025

What are some alternatives?

When comparing transformers and llama.cpp you can also consider the following projects:

sentence-transformers - State-of-the-Art Text Embeddings

ollama - Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

llama - Inference code for Llama models

mlc-llm - Universal LLM Deployment Engine with ML Compilation

text-generation-webui - LLM UI with advanced features, easy setup, and multiple backend support.

InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com
featured
Sevalla - Deploy and host your apps and databases, now with $50 credit!
Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
sevalla.com
featured

Did you know that Python is
the 2nd most popular programming language
based on number of references?