Llama.cpp Alternatives

Similar projects and alternatives to llama.cpp

llama.cpp

775 57,463 10.0 C++ llama.cpp VS llama.cpp

LLM inference in C/C++
whisper.cpp

187 31,426 9.8 C llama.cpp VS whisper.cpp

Port of OpenAI's Whisper model in C/C++
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
mlc-llm

89 17,053 9.9 Python llama.cpp VS mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
exllama

64 2,609 9.0 Python llama.cpp VS exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
lit-llama

23 5,814 8.4 Python llama.cpp VS lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
darknet

22 25,319 0.0 C llama.cpp VS darknet

Convolutional Neural Networks
serving

12 6,079 9.8 C++ llama.cpp VS serving

A flexible, high-performance serving system for machine learning models
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
flake

5 593 4.4 Nix llama.cpp VS flake

A Nix flake for many AI projects
llama_cpp.rb

2 143 9.6 C++ llama.cpp VS llama_cpp.rb

llama_cpp provides Ruby bindings for llama.cpp
TokenHawk

1 98 10.0 C++ llama.cpp VS TokenHawk

Discontinued WebGPU LLM inference tuned by hand [Moved to: https://github.com/kayvr/token-hawk]

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better llama.cpp alternative or higher similarity.

Suggest an alternative to llama.cpp

llama.cpp reviews and mentions

Posts with mentions or reviews of llama.cpp. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-12.

Llama.cpp: Full CUDA GPU Acceleration
14 projects | news.ycombinator.com | 12 Jun 2023

llama.cpp can be run with a speedup for AMD GPUs when compiled with `LLAMA_CLBLAST=1` and there is also a HIPified fork [1] being worked on by a community contributor. The other week I was poking on how hard it would be to get an AMD card running w/ acceleration on Linux and was pleasantly surprised, it wasn't too bad: https://mostlyobvious.org/?link=/Reference%2FSoftware%2FGene...
That being said, it's important to note that ROCm is Linux only. Not only that, but ROCm's GPU support has actually been decreasing over the past few years. The current list: https://rocm.docs.amd.com/en/latest/release/gpu_os_support.h... Previously (2022): https://docs.amd.com/bundle/Hardware_and_Software_Reference_...
The ELI5 is that a few years back, AMD split their graphics (RDNA) and compute (CDNA) architectures, which Nvidia does too, but notably (what Nvidia definitely doesn't do, and a key to their success IMO) AMD also decided they would simply not support any CUDA-parity compute features on Windows or their non "compute" cards. In practice, this means that community/open-source developers will never have, tinker, port, or develop on AMD hardware, while on Nvidia you can start with a GTX/RTX card on your laptop, and use the same code up to an H100 or DGX.
llama.cpp is a super-high profile project, has almost 200 contributiors now, but AFAIK, no contributors from AMD. If AMD doesn't have the manpower, IMO they should simply be sending nsa free hardware to top open source project/library developers (and on the software side, their #1 priority should be making sure every single current GPU they sell is at least "enabled" if not "supported" in ROCm, on Linux and Windows).
[1] https://github.com/SlyEcho/llama.cpp/tree/hipblas