guidance vs llama-cpp-python

guidance

A guidance language for controlling large language models. (by guidance-ai)

Suggest topics

Source Code

Suggest alternative

Edit details

llama-cpp-python

Python bindings for llama.cpp (by abetlen)

Suggest topics

Source Code

llama-cpp-python.readthedocs.io

Suggest alternative

Edit details

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

guidance		llama-cpp-python
	Project
23	Mentions	54
17,246	Stars	6,378
5.1%	Growth	-
9.8	Activity	9.9
1 day ago	Latest Commit	1 day ago
Jupyter Notebook	Language	Python
MIT License	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

guidance

Posts with mentions or reviews of guidance. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-08.

Anthropic's Haiku Beats GPT-4 Turbo in Tool Use
5 projects | news.ycombinator.com | 8 Apr 2024

[1]: https://github.com/guidance-ai/guidance/tree/main
Show HN: Prompts as (WASM) Programs
9 projects | news.ycombinator.com | 11 Mar 2024

> The most obvious usage of this is forcing a model to output valid JSON
Isn't this something that Outlines [0], Guidance [1] and others [2] already solve much more elegantly?
0. https://github.com/outlines-dev/outlines
1. https://github.com/guidance-ai/guidance
2. https://github.com/sgl-project/sglang
Show HN: Fructose, LLM calls as strongly typed functions
10 projects | news.ycombinator.com | 6 Mar 2024
LiteLlama-460M-1T has 460M parameters trained with 1T tokens
1 project | news.ycombinator.com | 7 Jan 2024

Or combine it with something like llama.cpp's grammer or microsoft's guidance-ai[0] (which I prefer) which would allow adding some react-style prompting and external tools. As others have mentioned, instruct tuning would help too.
[0] https://github.com/guidance-ai/guidance
Forcing AI to Follow a Specific Answer Pattern Using GBNF Grammar
2 projects | /r/LocalLLaMA | 10 Dec 2023
Prompting LLMs to constrain output
2 projects | /r/LocalLLaMA | 8 Dec 2023

have been experimenting with guidance and lmql. a bit too early to give any well formed opinions but really do like the idea of constraining llm output.
Guidance is back 🥳
1 project | /r/LocalLLaMA | 16 Nov 2023
New: LangChain templates – fastest way to build a production-ready LLM app
6 projects | news.ycombinator.com | 1 Nov 2023
Is supervised learning dead for computer vision?
9 projects | news.ycombinator.com | 28 Oct 2023

Thanks for your comment.
I did not know about "Betteridge's law of headlines", quite interesting. Thanks for sharing :)
You raise some interesting points.
1) Safety: It is true that LVMs and LLMs have unknown biases and could potentially create unsafe content. However, this is not necessarily unique to them, for example, Google had the same problem with their supervised learning model https://www.theverge.com/2018/1/12/16882408/google-racist-go.... It all depends on the original data. I believe we need systems on top of our models to ensure safety. It is also possible to restrict the output domain of our models (https://github.com/guidance-ai/guidance). Instead of allowing our LVMs to output any words, we could restrict it to only being able to answer "red, green, blue..." when giving the color of a car.
2) Cost: You are right right now LVMs are quite expensive to run. As you said are a great way to go to market faster but they cannot run on low-cost hardware for the moment. However, they could help with training those smaller models. Indeed, with see in the NLP domain that a lot of smaller models are trained on data created with GPT models. You can still distill the knowledge of your LVMs into a custom smaller model that can run on embedded devices. The advantage is that you can use your LVMs to generate data when it is scarce and use it as a fallback when your smaller device is uncertain of the answer.
3) Labelling data: I don't think labeling data is necessarily cheap. First, you have to collect the data, depending on the frequency of your events could take months of monitoring if you want to build a large-scale dataset. Lastly, not all labeling is necessarily cheap. I worked at a semiconductor company and labeled data was scarce as it required expert knowledge and could only be done by experienced employees. Indeed not all labelling can be done externally.
However, both approaches are indeed complementary and I think systems that will work the best will rely on both.
Thanks again for the thought-provoking discussion. I hope this answer some of the concerns you raised
Show HN: Elelem – TypeScript LLMs with tracing, retries, and type safety
2 projects | news.ycombinator.com | 12 Oct 2023

I've had a bit of trouble getting function calling to work with cases that aren't just extracting some data from the input. The format is correct but it was harder to get the correct data if it wasn't a simple extraction.
Hopefully OpenAI and others will offer something like https://github.com/guidance-ai/guidance at some point to guarantee overall output structure.
Failed validations will retry, but from what I've seen JSONSchema + generated JSON examples are decently reliable in practice for gpt-3.5-turbo and extremely reliable on gpt-4.

llama-cpp-python

Posts with mentions or reviews of llama-cpp-python. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-11.

FLaNK AI for 11 March 2024
46 projects | dev.to | 11 Mar 2024
OpenAI: Memory and New Controls for ChatGPT
4 projects | news.ycombinator.com | 13 Feb 2024

I'll share the core bit that took a while to figure out the right format, my main script is a hot mess using embeddings with SentenceTransformer, so I won't share that yet. E.g: last night I did a PR for llama-cpp-python that shows how Phi might be used with JSON only for the author to write almost exactly the same code at pretty much the same time. https://github.com/abetlen/llama-cpp-python/pull/1184
TinyLlama LLM: A Step-by-Step Guide to Implementing the 1.1B Model on Google Colab
2 projects | dev.to | 6 Jan 2024

Python Bindings for llama.cpp
Mistral-8x7B-Chat
4 projects | news.ycombinator.com | 10 Dec 2023
Running Mistral LLM on Apple Silicon Using Apple's MLX Framework Is Much Faster
2 projects | news.ycombinator.com | 6 Dec 2023

If the model could be made to work with llama.cpp, then https://github.com/abetlen/llama-cpp-python might be more compact. llama.cpp only supports a limited list of model types though.
Run ChatGPT-like LLMs on your laptop in 3 lines of code
9 projects | news.ycombinator.com | 6 Sep 2023
Code Llama, a state-of-the-art large language model for coding
4 projects | news.ycombinator.com | 24 Aug 2023

https://github.com/abetlen/llama-cpp-python has a web server mode that replicates openai's API iirc and the readme shows it has docker builds already.
Meta: Code Llama, an AI Tool for Coding
18 projects | news.ycombinator.com | 24 Aug 2023

LocalAI https://localai.io/ and LMStudio https://lmstudio.ai/ both have fairly complete OpenAI compatibility layers. llama-cpp-python has a FastAPI server as well: https://github.com/abetlen/llama-cpp-python/blob/main/llama_... (as of this moment it hasn't merged GGUF update yet though)
First steps with llama
2 projects | dev.to | 31 Jul 2023

I went with Python, llama-cpp-python, since my goal is just to get a small project up and running locally.
Show HN: Khoj – Chat Offline with Your Second Brain Using Llama 2
14 projects | news.ycombinator.com | 30 Jul 2023

I see you’re using gpt4all; do you have a supported way to change the model being used for local inference?
A number of apps that are designed for OpenAI’s completion/chat APIs can simply point to the endpoints served by llama-cpp-python [0], and function in (largely) the same way, while supporting the various models and quants supported by llama.cpp. That would allow folks to run larger models on the hardware of their choice (including Apple Silicon with Metal acceleration) or using other proxies like openrouter.io.
[0]: https://github.com/abetlen/llama-cpp-python

What are some alternatives?

When comparing guidance and llama-cpp-python you can also consider the following projects:

lmql - A language for constraint-guided and efficient LLM programming.

LocalAI - :robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.

semantic-kernel - Integrate cutting-edge LLM technology quickly and easily into your apps

intel-extension-for-pytorch - A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

langchain - 🦜🔗 Build context-aware reasoning applications

llama.cpp - LLM inference in C/C++

NeMo-Guardrails - NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.

text-generation-inference - Large Language Model Text Generation Inference

text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

outlines - Structured Text Generation

FastChat - An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

guidance vs lmql llama-cpp-python vs LocalAI guidance vs semantic-kernel llama-cpp-python vs intel-extension-for-pytorch guidance vs langchain llama-cpp-python vs llama.cpp guidance vs NeMo-Guardrails llama-cpp-python vs text-generation-inference guidance vs text-generation-webui llama-cpp-python vs mlc-llm guidance vs outlines llama-cpp-python vs FastChat

Compare guidance vs llama-cpp-python and see what are their differences.

guidance

llama-cpp-python

guidance

llama-cpp-python

What are some alternatives?