LibreChat
llama.cpp
LibreChat | llama.cpp | |
---|---|---|
29 | 921 | |
29,523 | 85,794 | |
3.1% | 2.7% | |
9.9 | 10.0 | |
4 days ago | 2 days ago | |
TypeScript | C++ | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
LibreChat
-
A Mind Meld for the Modern Enterprise: Breaking Down Knowledge Silos with MCP
Note this slight issue if you try this at home
- LibreChat: Enhanced ChatGPT Clone
-
Hosting HuggingFace Models with KoboldCpp and RunPod
RunPod is a popular service that allows you to rent GPU hours on-demand. While they might not be the best option and they also have other plans, those are out of the scope of this article. This article will cover how to host models on the cloud, from choosing the models you want to sending requests to your own pod with LibreChat.
-
Ask HN: Browser-UI with multi-LLM for the team?
I'm looking to provide our 4-person team access to various LLMs, including OpenAI, Anthropic, Meta's Llama, and some others.
I want to have browser-based UI (like ChatGPT) to upload files and get Markdown-reponses with price control to avoid unexpected high costs at the end of the month.
My current plan is to purchase API keys from different LLM providers and use LiteLLM as a proxy to setup cost limits OR to use OpenRouter.ai (potentially still using LiteLLM).
LibreChat currently doesn't support "passing through" uploads, but this feature is expected to be implemented this week: https://github.com/danny-avila/LibreChat/discussions/3760#discussioncomment-10445600
My Questions:
Is my approach using LiteLLM or OpenRouter with a UI a viable solution? I believe this could be more cost-effective than purchasing monthly chat accounts for each of the four users.
Are there other UIs similar to ChatGPT and Anthropic's interfaces, for the case that the LibreChat feature is too instable?
-
LM Studio 0.3.0
A better question would be over something like Jan or LibreChat. Ollama's is CLI/API/backend for easily downloading and running models.
https://github.com/janhq/jan
https://github.com/danny-avila/LibreChat
Jan's probably the closest thing to a open-source LLM chat interface that is relatively easy to get started with.
I personally prefer Librechat (which supports integration with image generation) but it does have to spin up some docker stuff and that can make it a bit more complicated.
-
Claude 3.5 Sonnet
Pretty much all of the features you mention are already in LibreChat (MIT License). If you don't mind self-hosting, then it has branching, convo search, change models mid-chat, "presets" (save system prompts), and a whole lot more. I've deployed it in my gov agency for months now, and I've had amazing feedback. https://github.com/danny-avila/LibreChat
- Show HN: A better UI for ChatGPT, Claude with text search, saved chats and more
- Integra múltiples APIs de IA en una sola plataforma
-
text-generation-webui VS LibreChat - a user suggested alternative
2 projects | 29 Feb 2024
Better Azure OpenAI and OpenAI support, as well as API based AI services
-
open-webui VS LibreChat - a user suggested alternative
2 projects | 29 Feb 2024
Better Azure OpenAI and OpenAI support, as well as API based AI services
llama.cpp
-
DeepSeek-v3.1 Release
I maintain a cross-platform llama.cpp client - you're right to point out that generally we expect nuking logits can take care of it.
There is a substantial performance cost to nuking, the open source internals discussion may have glossed over that for clarity (see github.com/llama.cpp/... below). The cost is very high, default in API* is not artificially lower other logits, and only do that if the first inference attempt yields a token invalid in the compiled grammar.
Similarly, I was hoping to be on target w/r/t to what strict mode is in an API, and am sort of describing the "outer loop" of sampling
* blissfully, you do not have to implement it manually anymore - it is a parameter in the sampling params member of the inference params
* "the grammar constraints applied on the full vocabulary can be very taxing. To improve performance, the grammar can be applied only to the sampled token..and nd only if the token doesn't fit the grammar, the grammar constraints are applied to the full vocabulary and the token is resampled." https://github.com/ggml-org/llama.cpp/blob/54a241f505d515d62...
- Guide: Running GPT-OSS with Llama.cpp
-
Ollama and gguf
ik_llama.cpp is another fork of llama.cpp. I followed the development of GLM4.5 support in both projects.
The ik_llama.cpp developers had a working implementation earlier than llama.cpp, but their GGUFs were not compatible with the mainline.
After the changes in llama.cpp were merged into master, ik_llama.cpp reworked their implementation and ported it to align with upstream: https://github.com/ggml-org/llama.cpp/pull/14939#issuecommen...
>Many thanks to @sammcj, @CISC, and everyone who contributed! The code has been successfully ported and merged into ik_llama.
This is how it should be done.
-
How to Install & Run GPT-OSS 20b and 120b GGUF Locally?
apt-get update apt-get install -y pciutils build-essential cmake curl libcurl4-openssl-dev git git clone https://github.com/ggml-org/llama.cpp cmake llama.cpp -B llama.cpp/build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-server cp llama.cpp/build/bin/llama-* llama.cpp/
-
Mistral Integration Improved in Llama.cpp
llama.cpp still doesn't support gpt-oss tool calling. https://github.com/ggml-org/llama.cpp/pull/15158 (among other similar PRs)
But I also couldn't get vllm, or transformers serve, or ollama (400 response on /v1/chat/completions) working today with gpt-oss. OpenAI's cookbooks aren't really copy paste instructions. They probably tested on a single platform with preinstalled python packages which they forgot to mention :))
-
How Attention Sinks Keep Language Models Stable
There was skepticism last time this was posted https://news.ycombinator.com/item?id=37740932
Implementation for gpt-oss this week showed 2-3x improvements https://github.com/ggml-org/llama.cpp/pull/15157 https://www.reddit.com/r/LocalLLaMA/comments/1mkowrw/llamacp...
-
OpenAI Open Models
Holy smokes, there's already llama.cpp support:
https://github.com/ggml-org/llama.cpp/pull/15091
- Llama.cpp: Add GPT-OSS
-
My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)
MLX does have good software support. Targeting both iOS and mac is a big win in itself.
I wonder what's possible, what the software situation is today with the PC NPU's. AMD's XDNA has been around for a while, XDNA2 jumps from 10->40 TOps. The "AMDXDNA" driver merged in 6.14 last winter: where are we now?
But not seeing any evidence that there's popular support in any of the main frameworks. https://github.com/ggml-org/llama.cpp/issues/1499 https://github.com/ollama/ollama/issues/5186
Good news, AMD has an initial implementation of llama.cpp. I don't particularly know what it means, but the firt gen supports W4ABF16 quantization, newer chips support W8A16. https://github.com/ggml-org/llama.cpp/issues/14377 . I'm not sure what it's good for, but there is a Linux "xdna-driveR", https://github.com/amd/xdna-driver
Would also be interesting to know how this compares to say the huge iGPU on Strix Halo. I don't know these NPUs similarly work with very large models.
There's a lot of other folks also starting on their NPU journeys. ARM's Ethos, and Rockchip's RKNN recently shipped Linux kernel drivers, but it feels like that's just a start? https://www.phoronix.com/news/Arm-Ethos-NPU-Accel-Driver https://www.phoronix.com/news/Rockchip-NPU-Driver-RKNN-2025
- AMD teams contributing to the llama.cpp codebase
What are some alternatives?
open-webui - User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
ollama - Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
ollama-webui - ChatGPT-Style WebUI for LLMs (Formerly Ollama WebUI) [Moved to: https://github.com/open-webui/open-webui]
mlc-llm - Universal LLM Deployment Engine with ML Compilation
chat-with-gpt - An open-source ChatGPT app with a voice
text-generation-webui - LLM UI with advanced features, easy setup, and multiple backend support.