Guidance: A guidance language for controlling large language models

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

guidance

23 17,246 9.8 Jupyter Notebook

A guidance language for controlling large language models.

I dug into this a while back, iirc, it comes down to "pausing" template rendering and calling the LLM with all content generated so far. https://github.com/guidance-ai/guidance/blob/main/guidance/l...
This is how we implemented it anyhow, with some more parameters to control how that all works (and the LLM params) at each "pause" point. The _neat_ part for us was that a template helper could make use of the partially generated content. Hadn't thought about that before for a templating engine, but was trivial to implement in the end

ad-llama

6 47 9.1 TypeScript

Structured inference with Llama 2 in your browser

I took a stab at making something[1] like guidance - I'm not sure exactly how guidance does it (and I'm also really curious how it would work with chat api's) but here's how my solution works.
Each expression becomes a new inference request, so it's not a single inference pass. Because each subsequent pass includes the previously inferenced text, the LLM ends up doing a lot of prefill and less decode. You only decode as much as you actually inference, the repeated passes only end up costing more in prefill (which tend to be much faster tok/s).
To work with chat tuned instruction models, you can basically still treat it as a completion model. I provide the previously completed inference text as a partially completed assistant response, e.g. with llama 2 it goes after [/INST]. You can add a bit of instruction for each inference expression which gets added to the [INST]. This approach lets you start off the inference with `{ "someField": "` for example to guarantee (at least the start of) a json response and allow you to add a little bit of instruction or context just for that field.
I didn't even try with openai api's since afaict you can't provide a partial assistant response for it to continue from. Even if you were to request a single token at a time and use logit_bias for biased sampling, I don't see how you can get it to continue a partially completed inference.
[1] https://github.com/gsuuon/ad-llama

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
hof

33 475 8.9 Go

Framework that joins data models, schemas, code generation, and a task engine. Language and technology agnostic.

Yea, in particular for this project, they have created a bespoke templating system.
You can get the same thing with Go text/templates by adding chat function(s) as custom a helper: https://github.com/hofstadter-io/hof/blob/_dev/lib/templates...

guidance

89 12,248 9.5 Jupyter Notebook

Discontinued A guidance language for controlling large language models. [Moved to: https://github.com/guidance-ai/guidance] (by microsoft)

This IS Microsoft Guidance, they seem to have spun off a separate GitHub organization for it.
https://github.com/microsoft/guidance redirects to https://github.com/guidance-ai/guidance now.

llama.cpp

769 55,846 10.0 C++

LLM inference in C/C++

Right, there are many folks (dozens of us!) yelling about logit processors and building them into various frameworks.
The mostly widely accessible form of this is probably BNF grammar biasing in llama.cpp: https://github.com/ggerganov/llama.cpp/blob/master/grammars/...

api

3 151 6.5 MDX

Discontinued Structured LLM APIs (by thiggle)

Logit-bias guidance goes a long way -- LLM structure for regex, context-free grammars, categorization, and typed construction. I'm working on a hosted and model-agnostic version of this with thiggle
[0] https://thiggle.com

llm

23 2,903 9.5 Python

Access large language models from the command-line (by simonw)

`llm` might be the closest thing to that right now.
https://github.com/simonw/llm

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project