Guidance: A guidance language for controlling large language models

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • guidance

    A guidance language for controlling large language models.

  • I dug into this a while back, iirc, it comes down to "pausing" template rendering and calling the LLM with all content generated so far. https://github.com/guidance-ai/guidance/blob/main/guidance/l...

    This is how we implemented it anyhow, with some more parameters to control how that all works (and the LLM params) at each "pause" point. The _neat_ part for us was that a template helper could make use of the partially generated content. Hadn't thought about that before for a templating engine, but was trivial to implement in the end

  • ad-llama

    Structured inference with Llama 2 in your browser

  • I took a stab at making something[1] like guidance - I'm not sure exactly how guidance does it (and I'm also really curious how it would work with chat api's) but here's how my solution works.

    Each expression becomes a new inference request, so it's not a single inference pass. Because each subsequent pass includes the previously inferenced text, the LLM ends up doing a lot of prefill and less decode. You only decode as much as you actually inference, the repeated passes only end up costing more in prefill (which tend to be much faster tok/s).

    To work with chat tuned instruction models, you can basically still treat it as a completion model. I provide the previously completed inference text as a partially completed assistant response, e.g. with llama 2 it goes after [/INST]. You can add a bit of instruction for each inference expression which gets added to the [INST]. This approach lets you start off the inference with `{ "someField": "` for example to guarantee (at least the start of) a json response and allow you to add a little bit of instruction or context just for that field.

    I didn't even try with openai api's since afaict you can't provide a partial assistant response for it to continue from. Even if you were to request a single token at a time and use logit_bias for biased sampling, I don't see how you can get it to continue a partially completed inference.

    [1] https://github.com/gsuuon/ad-llama

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • hof

    Framework that joins data models, schemas, code generation, and a task engine. Language and technology agnostic.

  • Yea, in particular for this project, they have created a bespoke templating system.

    You can get the same thing with Go text/templates by adding chat function(s) as custom a helper: https://github.com/hofstadter-io/hof/blob/_dev/lib/templates...

  • guidance

    Discontinued A guidance language for controlling large language models. [Moved to: https://github.com/guidance-ai/guidance] (by microsoft)

  • This IS Microsoft Guidance, they seem to have spun off a separate GitHub organization for it.

    https://github.com/microsoft/guidance redirects to https://github.com/guidance-ai/guidance now.

  • llama.cpp

    LLM inference in C/C++

  • Right, there are many folks (dozens of us!) yelling about logit processors and building them into various frameworks.

    The mostly widely accessible form of this is probably BNF grammar biasing in llama.cpp: https://github.com/ggerganov/llama.cpp/blob/master/grammars/...

  • api

    Discontinued Structured LLM APIs (by thiggle)

  • Logit-bias guidance goes a long way -- LLM structure for regex, context-free grammars, categorization, and typed construction. I'm working on a hosted and model-agnostic version of this with thiggle

    [0] https://thiggle.com

  • llm

    Access large language models from the command-line (by simonw)

  • `llm` might be the closest thing to that right now.

    https://github.com/simonw/llm

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts