Could I get a suggestion for a simple HTTP API with no GUI for llama.cpp?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA

InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com
featured
Sevalla - Deploy and host your apps and databases, now with $50 credit!
Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
sevalla.com
featured
  1. llama.cpp-dotnet

    Minimal C# bindings for llama.cpp + .NET core library with API host/client.

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. llama-cpp-python

    Python bindings for llama.cpp

  4. go-llama.cpp

    LLama.cpp golang bindings

    Go: go-skynet/go-llama.cpp

  5. llama-node

    Discontinued Believe in AI democratization. llama for nodejs backed by llama-rs, llama.cpp and rwkv.cpp, work locally on your laptop CPU. support llama/alpaca/gpt4all/vicuna/rwkv model.

    Node.js: hlhr202/llama-node

  6. llama_cpp.rb

    llama_cpp.rb provides Ruby bindings for llama.cpp

    Ruby: yoshoku/llama_cpp.rb

  7. LLamaSharp

    A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

    C#/.NET: SciSharp/LLamaSharp

  8. FastChat

    An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

    I used the FastChat API to load two quantized Vicuna-13 models locally so I could repeatedly query them for the modern translation of a given paragraph from the complete works of Jonathan Swift. Then I LoRa+PEFTed Llama-7b to convert from modern English to Swift. Works great: https://huggingface.co/pcalhoun/LLaMA-7b-JonathanSwift

  9. Sevalla

    Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!

    Sevalla logo
  10. LocalAI

    :robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Show HN: Run AI models directly in the browser – no server or internet required

    3 projects | news.ycombinator.com | 23 Aug 2025
  • Paddler - open-source llama.cpp load balancer (self-host LLMs in production)

    2 projects | dev.to | 28 Jun 2024
  • FreedomGPT: AI with no censorship

    3 projects | /r/KotakuInAction | 12 May 2023
  • Show HN: Paddler – open-source LLMOps platform for hosting AI in your own infra

    1 project | news.ycombinator.com | 15 Aug 2025
  • My Top Open-Source AI Tools for Building Smarter in 2025

    7 projects | dev.to | 14 Aug 2025

Did you know that Python is
the 2nd most popular programming language
based on number of references?