SaaSHub helps you find the best software and product alternatives Learn more →
Llama-cpp-python Alternatives
Similar projects and alternatives to llama-cpp-python
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
textgen
Open-source desktop app for local LLMs. Text, vision, tool-calling, OpenAI/Anthropic-compatible API. 100% private.
-
ollama
Get up and running with Kimi-K2.6, GLM-5.1, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
-
-
-
LocalAI
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
-
-
FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
-
-
khoj
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.
-
refact
Discontinued AI Agent that handles engineering tasks end-to-end: integrates with developers’ tools, plans, executes, and iterates until it achieves a successful result.
-
-
-
TensorRT-LLM
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
-
basaran
Discontinued Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.
-
intel-extension-for-pytorch
Discontinued A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
-
-
-
llama-cpp-python discussion
llama-cpp-python reviews and mentions
-
What Surprised Me About Building a Python RAG Pipeline with Open-Source LLMs
I’ll walk through the stack I landed on, with code you can actually run. For context, I used sentence-transformers for embeddings and llama.cpp (via llama-cpp-python) for the LLM. I chose these because they’re popular, actively maintained, and don’t require a GPU (though you’ll want one if your docs are big).
-
Medical RAG Research with txtai
Substitute your own embeddings database to change the knowledge base. txtai supports running local LLMs via transformers or llama.cpp. It also supports a wide variety of LLMs via LiteLLM. For example, setting the 2nd RAG pipeline parameter below to gpt-4o along with the appropriate environment variables with access keys switches to a hosted LLM. See this documentation page for more on this.
-
Failed to load shared library 'llama.dll': Could not find (llama-cpp-python)
If you're working with LLMs and trying out llama-cpp-python, you might run into some frustrating issues on Windows — especially when installing or importing the package.
-
Apple reveals M3 Ultra, taking Apple Silicon to a new extreme
Ah, I didn’t realize they’d upped the memory bandwidth to DDR5-6000 (vs 4800), thanks for the correction!
The memory bandwidth does not double, I believe. See this random issue for a graph that has single/dual socket measurements, there is essentially no difference: https://github.com/abetlen/llama-cpp-python/issues/1098
Perhaps this is incorrect now, but I also know with 2x 4090s you don’t get higher tokens per second than 1x 4090 with llama.cpp, just more memory capacity.
- Knowledge graphs using Ollama and Embeddings to answer and visualizing queries
- Python Bindings for Llama.cpp
-
Ollama v0.1.33 with Llama 3, Phi 3, and Qwen 110B
There's a Python binding for llama.cpp which is actively maintained and has worked well for me: https://github.com/abetlen/llama-cpp-python
- FLaNK AI for 11 March 2024
-
OpenAI: Memory and New Controls for ChatGPT
I'll share the core bit that took a while to figure out the right format, my main script is a hot mess using embeddings with SentenceTransformer, so I won't share that yet. E.g: last night I did a PR for llama-cpp-python that shows how Phi might be used with JSON only for the author to write almost exactly the same code at pretty much the same time. https://github.com/abetlen/llama-cpp-python/pull/1184
-
TinyLlama LLM: A Step-by-Step Guide to Implementing the 1.1B Model on Google Colab
Python Bindings for llama.cpp
-
A note from our sponsor - SaaSHub
www.saashub.com | 13 Jun 2026
Stats
abetlen/llama-cpp-python is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of llama-cpp-python is Python.
Popular Comparisons
- llama-cpp-python VS llama.cpp
- llama-cpp-python VS mlc-llm
- llama-cpp-python VS ollama
- llama-cpp-python VS intel-extension-for-pytorch
- llama-cpp-python VS lmdeploy
- llama-cpp-python VS textgen
- llama-cpp-python VS text-generation-inference
- llama-cpp-python VS LocalAI
- llama-cpp-python VS FastChat
- llama-cpp-python VS ctransformers