metal-cpp
llama.cpp
metal-cpp | llama.cpp | |
---|---|---|
16 | 908 | |
317 | 82,588 | |
3.2% | 1.7% | |
3.3 | 10.0 | |
7 months ago | 5 days ago | |
C++ | C++ | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
metal-cpp
-
Nitro: A fast, lightweight 3MB inference server with OpenAI-Compatible API
My understanding is the proliferation of “XYZ-cpp” AI frameworks is due to the c++ support in Apple’s gpu library ‘Metal’, and the popularity of apple silicon for inference (and there are a few technical reasons for this): https://developer.apple.com/metal/cpp/
-
Show HN: C-ocoa, Write iOS/macOS apps in any language, with a generated C API
This is basically also what the "official" C++ API for Metal does (https://developer.apple.com/metal/cpp/), it's an automatically generated bindings wrapper which calls into ObjC runtime functions.
I also dabbled a bit with this idea by parsing clang AST-dumps of macOS system headers:
https://github.com/floooh/objc-ast-experiments
Unfortunately this is very brittle, and also broke on ARM CPUs, I guess the shim code needs some ABI adjustments (famously, objc_msgSend has multiple "ABI shapes": https://www.mikeash.com/pyblog/objc_msgsends-new-prototype.h...).
-
What's the best way to learn Metal?
There's official C++-interface: https://developer.apple.com/metal/cpp/
- What are some alternatives to OpenGL for Mac
- Opinion for graphic api's?
-
A brief interview with Tcl creator John Ousterhout
It doesn't matter if the project driven by Microsoft or not, the cat (of automatically generated language bindings) is out of the bag. E.g. Zig is using the same approach without being an official MS project: https://github.com/marlersoft/zigwin32, and Apple has an automatically generated C++ API for Metal (https://developer.apple.com/metal/cpp/).
In the future, the question won't be "what language do I need to learn to code on this platform", but instead "are there language bindings for my favourite language".
- Cross platform low level graphics API suitable for game development?
-
GCC now includes Modula-2 and Rust. Do they work on OpenBSD?
this? https://developer.apple.com/metal/cpp/
Doesn't it just use objc/runtime.h and if anything is missing you can just add your custom api calls?
-
A learning path for Vulkan that focuses on concepts?
Metal has C++ bindings (which cover a full app lifecycle so you don’t have to touch Objective-C/Swift at all) but they’re based on the Objective-C memory model. There are some helper structs mimicking shared pointers, but you’ll still need to understand the basics of how an autorelease pool is used to avoid memory leaks and/or bad access crashes.
-
CTO of Azure declares C++ "deprecated"
On https://developer.apple.com/metal/cpp/ check Foundation folder and all those nice Object::sendMessage().
llama.cpp
-
Smollm3: Smol, multilingual, long-context reasoner LLM
Hey Simon, VB from Hugging Face here and the person who added the model to MLX and llama.cpp (with Son). The PR hasn't yet landed on llama.cpp, hence it doesn't work OTB on llama.cpp installed via brew (similarly doesn't work with ollama since they need to bump their llama.cpp runtime)
The easiest would be to install llama.cpp from source: https://github.com/ggml-org/llama.cpp
If you want to avoid it, I added SmolLM3 to MLX-LM as well:
You can run it via `mlx_lm.chat --model "mlx-community/SmolLM3-3B-bf16"`
(requires the latest mlx-lm to be installed)
here's the MLX-lm PR if you're interested: https://github.com/ml-explore/mlx-lm/pull/272
similarly, llama.cpp here: https://github.com/ggml-org/llama.cpp/pull/14581
Let me know if you face any issues!
- Hunyuan-A13B model support has been merged into llama.cpp
-
Is GitHub Releases Down?
I just noticed this today, try to download an asset on any releases page and it will show a 500 error page.
Here is an example on Ollama, llama.cpp:
https://github.com/ollama/ollama/releases/tag/v0.9.1
https://github.com/ggml-org/llama.cpp/releases/tag/b5688
You will get a 500 error page from GitHub.
Does this happen with anyone else?
-
AMD's AI Future Is Rack Scale 'Helios'
The average consumer uses llama.cpp. So here is your list of kernels: https://github.com/ggml-org/llama.cpp/tree/master/ggml/src/g...
And here is pretty damning evidence that you're full of shit:
https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/g...
The ggml-hip backend references the ggml-cuda kernels. The "software is the same" and yet AMD is still behind.
- Ask HN: What's the coolest AI project you've seen?
-
OpenAI dropped the price of o3 by 80%
> for near negligible drop in quality
Hmm, that's evidently and anecdotally wrong:
https://github.com/ggml-org/llama.cpp/discussions/4110
-
What is currently the best LLM model for consumer grade hardware? Is it phi-4?
The number of quantized bits is a trade off between size and quality. Ideally you should be aiming for a 6-bit or 5-bit model. I've seen some models be unstable at 4-bit (where they will either repeat words or start generating random words).
Anything below 4-bits is usually not worth it unless you want to experiment with running a 70B+ model -- though I don't have any experience of doing that, so I don't know how well the increased parameter size balances the quantization.
See https://github.com/ggml-org/llama.cpp/pull/1684 and https://gist.github.com/Artefact2/b5f810600771265fc1e3944228... for comparisons between quantization levels.
-
Controlling Chrome with an AnythingLLM MCP Agent
AnythingLLM is becoming my tool of choice for connecting to my local llama.cpp server and recently added MCP support.
-
Devstral
I would recommend just trying it out! (as long as you have the disk space for a few models). llama.cpp[0] is pretty easy to download and build and has good support for M-series Macbook Airs. I usually just use LMStudio[1] though - it's got a nice and easy-to-use interface that looks like the ChatGPT or Claude webpage, and you can search for and download models from within the program. LMStudio would be the easiest way to get started and probably all you need. I use it a lot on my M2 Macbook Air and it's really handy.
[0] - https://github.com/ggml-org/llama.cpp
[1] - https://lmstudio.ai/
-
Ollama's llama.cpp licensing issue goes unanswered for over a year
FWIW, llama.cpp links to and fetches models from ollama (https://github.com/ggml-org/llama.cpp/blob/master/tools/run/...).
This issue seems to be the typical case of someone being bothered for someone else, because it implies there's no "recognition of source material" when there's quite a bit of symbiosis between the projects.
What are some alternatives?
MoltenVK - MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
ollama - Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
metal-rs - Deprecated Rust bindings for Metal
gpt4all - GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
clspv - Clspv is a compiler for OpenCL C to Vulkan compute shaders
text-generation-webui - LLM UI with advanced features, easy setup, and multiple backend support.