llama.cpp
gpt4all
| llama.cpp | gpt4all | |
|---|---|---|
| 1,032 | 150 | |
| 115,929 | 77,364 | |
| 7.4% | 0.0% | |
| 10.0 | 9.5 | |
| 3 days ago | about 1 year ago | |
| C++ | C++ | |
| MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llama.cpp
-
How to Setup a Local Coding Agent on macOS
> The benchmark prompt was:
> Write a compact Python function that parses a unified diff and returns the changed file paths. Then explain two edge cases.
> Each benchmark generated about 128 tokens.
Generating 128 tokens is probably not enough for good benchmark results. MTP speedup depends on how often the predicted tokens are accepted. In my experience, the very early output has a higher acceptance rate, so short testing can give false positive speedups.
Also llama.cpp includes a tool specifically for benchmarking:
https://github.com/ggml-org/llama.cpp/blob/master/tools/llam...
-
Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 80.2 tok/s)
In my build, MTP came from mainline llama.cpp, not ik_llama. ik_llama got me to ~47 (engine + quant), but I couldn't get MTP running there — my build rejected the -mtp flags and ignored the model's nextn tensors. Mainline llama.cpp added MTP fairly recently (PR #22673, merged 2026-05-16), and that's where it worked for me. (There may well be an ik_llama path I missed — this is just what got it going on my box.)
- New `llama.cpp` Updates, AI Agents for Any LLM, and Quantized Vector Index for Local Inference
- Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency
-
Introducing LlamaStash: a zero-overhead, terminal-native llama.cpp launcher
That script grew up. Today I'm releasing LlamaStash, the first public release of a fast, cross-platform, terminal-native launcher for llama.cpp with zero overhead.
-
How fast is LlamaStash? Overhead, throughput, and a fair comparison with Ollama and LM Studio
LlamaStash spawns the unmodified upstream llama-server. So three different questions follow from that, and there is a benchmark suite for each.
-
A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU)
llama.cpp includes a benchmarking tool called llama-bench https://github.com/ggml-org/llama.cpp/blob/master/tools/llam...
ik_llama includes llama-sweep-bench https://github.com/ikawrakow/ik_llama.cpp/blob/main/examples...
When comparing hardware, the output of these tools is very helpful to let others put it into context. The post says the output is "reading speed" but knowing the prefill and token generation speeds would be a lot more helpful.
-
Racket v9.2 is now available
lol the same way we implement all of the reduced precision fp8, fp4 types today: by storing them in the corresponding uint:
https://github.com/ggml-org/llama.cpp/discussions/15095
- Run Gemma-4 E2B-it with llama.cpp on Raspberry Pi4
-
Gemma 4 dense by default: why your local agent doesn't want the MoE
# Build llama.cpp with Metal backend git clone https://github.com/ggml-org/llama.cpp cd llama.cpp && cmake -B build -DGGML_METAL=ON && cmake --build build -j # Community-quantized GGUFs (Google ships safetensors; unsloth ships GGUF) huggingface-cli download unsloth/gemma-4-31B-it-GGUF \ gemma-4-31B-it-Q4_K_M.gguf --local-dir . huggingface-cli download unsloth/gemma-4-26B-A4B-it-GGUF \ gemma-4-26B-A4B-it-Q4_K_M.gguf --local-dir . # Benchmark: 200 generations of 512 tokens, log per-call timing ./build/bin/llama-bench -m gemma-4-31B-it-Q4_K_M.gguf -n 512 -r 200 -o json > dense.json ./build/bin/llama-bench -m gemma-4-26B-A4B-it-Q4_K_M.gguf -n 512 -r 200 -o json > moe.json
gpt4all
-
GPT4All Has a Free API: Run Private LLMs Locally with Python Bindings
GPT4All GitHub — 72K+ stars
-
AI: Introduction to Ollama for local LLM launch
GPT4All: also a solution with UI, simple, has fewer features than ollama/llama.cpp
-
Command Line LLM Text Completions
The model must exist in GPT4All's model path. On arch this is ~/.local/share/nomic.ai/GPT4All/. An entry for this model must exist in models.json. You can use the metadata provided by nomic or specify your own in the following format if your model is not listed. The GPT4All wiki provides find guidance on configuring custom models.
-
Running Ollama on Docker: A Quick Guide
Hi it's me again! Over the past few days, I've been testing multiples ways to work with LLMs locally, and so far, Ollama was the best tool (ignoring UI and other QoL aspects) for setting up a fast environment to test code and features. I've tried GPT4ALL and other tools before, but they seem overly bloated when the goal is simply to set up a running model to connect with a LangChain API (on Windows with WSL).
-
6 Easy Ways to Run LLM Locally + Alpha
https://github.com/nomic-ai/gpt4all support OS: Windows, Linux, MacOS
-
Top 8 OpenSource Tools for AI Startups
Generative AI is hot, and ChatGPT4all is an exciting open-source option. It allows you to run your own language model without needing proprietary APIs, enabling a private and customizable experience.
-
Forget ChatGPT: why researchers now run small AIs on their laptops
GPT4All for an even easier gui
https://github.com/nomic-ai/gpt4all
-
The 6 Best LLM Tools To Run Models Locally
GPT4ALL is built upon privacy, security, and no internet-required principles. Users can install it on Mac, Windows, and Ubuntu. Compared to Jan or LM Studio, GPT4ALL has more monthly downloads, GitHub Stars, and active users.
-
Llama 3.1 web search integrated into GPT4All Beta
From the moment Llama 3.1 was released, GPT4All developers have been working hard to make a beta version of tool calling available. We're happy to announce that the beta is now ready. The first tool is web search implemented through brave.com just as in the Llama 3.1 paper.
A wiki has been made to walk users through the setup here: https://github.com/nomic-ai/gpt4all/wiki/Web-Search-Beta-Rel...
Join us on discord to give feedback and get help with the new Llama 3.1 Beta for GPT4All: https://discord.com/invite/4M2QFmTt2k
-
Show HN: Site2pdf
Thanks for taking the time to respond. I was thinking of something local, especially in light of:
Google's Gemini AI caught scanning Google Drive PDF files without permission https://news.ycombinator.com/item?id=40965892 .
Looks like GPT4All[1] and AnythingLLM[2] are worth exploring. There's also the closed-source macOS app RecurseChat[3,4] which appeared on HN a few months ago[5].
[1] https://github.com/nomic-ai/gpt4all
[2] https://github.com/Mintplex-Labs/anything-llm
[3] https://recurse.chat
[4] https://recurse.chat/blog/posts/local-docs
[5] https://news.ycombinator.com/item?id=39532367
What are some alternatives?
koboldcpp - Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
anything-llm - Stop renting your intelligence. Own it with AnythingLLM. Everything you need for a powerful local-first agent experience
unsloth - Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.
ollama - Get up and running with Kimi-K2.6, GLM-5.1, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
mlc-llm - Universal LLM Deployment Engine with ML Compilation
textgen - Open-source desktop app for local LLMs. Text, vision, tool-calling, OpenAI/Anthropic-compatible API. 100% private.