Show HN: LLMs can generate valid JSON 100% of the time

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • llama.cpp

    LLM inference in C/C++

  • I may get heavily downvoted for my criticism here, but here we go again: yet another "innovation" that's fueled by the stupid money poured into AI startups in the past 2 years. Imagine thinking that adding regex on top of an LLM is worth $8.5M[1]. At least Llama's grammar-based sampling[2] is a bit more interesting but still essentially putting lipstick on a pig.

    How is telling the language model "no, not like that, give me another token" at every step of token inference getting so many people ecstatic? The paper is basically undergrad-level excitement about something not even remotely interesting. Congratulations, you reinvented Markov chains (oh, sorry, "state machines") on top of LLMs.

    I mean of course you can guarantee grammar and schema well-formedness as, duh, you have what essentially amounts to a post-processing step. Maybe I'm the idiot here, is anyone actually using any of these tools in production?

    [1] https://www.benzinga.com/pressreleases/23/06/n32834246/norma...

    [2] https://github.com/ggerganov/llama.cpp/pull/1773/files

  • outlines

    Structured Text Generation

  • We can extend our approach to grammar-based sampling, as explained in the paper linked above. Relevant PR: https://github.com/normal-computing/outlines/pull/178

    Our method is much more efficient. llama.cpp loops over the entire vocabulary (~50k tokens) at each step to generate the mask. We generate an index at initialization, and building the masks at each step only requires a dictionary lookup (trade speed for memory). Sampling is just as fast as standard sampling.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • guidance

    A guidance language for controlling large language models.

  • OpenAI has this capability built in with functions[0], I believe! Building my own project[1] I have implemented functions in combination with guidance[2] and haven’t had a hiccup yet! I have a JSON parser function there, just in case, but it seems to be working reliably!

    Here’s a bit more of a description of using the functions API for JSON returns: https://yonom.substack.com/p/native-json-output-from-gpt-4

    [0] https://openai.com/blog/function-calling-and-other-api-updat...

    [1] https://resgen.app

    [2] https://github.com/guidance-ai/guidance

  • jsonformer

    A Bulletproof Way to Generate Structured JSON from Language Models

  • I'm not sure how this is different than:

    https://github.com/1rgs/jsonformer

    or

    https://github.com/newhouseb/clownfish

    or

    https://github.com/mkuchnik/relm

    or

    https://github.com/ggerganov/llama.cpp/pull/1773

    or

    https://github.com/Shopify/torch-grammar

    Overall there are a ton of these logit based guidance systems, the reason they don't get tons of traction is the SOTA models are behind REST APIs that don't enable this fine-grained approach.

    Those models perform so much better that people generally settle for just re-requesting until they get the correct format (and with GPT-4 that ends up being a fairly rare occurrence in my experience)

  • torch-grammar

  • I'm not sure how this is different than:

    https://github.com/1rgs/jsonformer

    or

    https://github.com/newhouseb/clownfish

    or

    https://github.com/mkuchnik/relm

    or

    https://github.com/ggerganov/llama.cpp/pull/1773

    or

    https://github.com/Shopify/torch-grammar

    Overall there are a ton of these logit based guidance systems, the reason they don't get tons of traction is the SOTA models are behind REST APIs that don't enable this fine-grained approach.

    Those models perform so much better that people generally settle for just re-requesting until they get the correct format (and with GPT-4 that ends up being a fairly rare occurrence in my experience)

  • json-schema-spec

    The JSON Schema specification

  • Outlines is a Python library that focuses on text generation with large language models. Brandon and I are not LLM experts and started the project a few months ago because we wanted to understand better how the generation process works. Our original background is probabilistic, relational and symbolic programming.

    Recently we came up with a fast way to generate text that matches a regex (https://blog.normalcomputing.ai/posts/2023-07-27-regex-guide...). The basic idea is simple: regular expressions have an equivalent Deterministic-Finite Automaton (DFA) representation. We can transform this DFA into a generative model: in each state we get a list of symbols which correspond to completions that partially match the regular expression. We mask the other symbols in the logits returned by a large language model, sample a new symbol and move to the next state. The subtelty is that language models work with tokens, not symbols, so we derive a new FSM whose alphabet is the model's vocabulary. We can do this in only one pass over the vocabulary.

    Generating the token masks thus only requires a dictionary lookup at each state. Our method blows other libraries like Microsoft's guidance out of the water.

    From there it was only a small leap to be able to generate text that follows a JSON schema (https://json-schema.org/), or is parseable into a Pydantic model (https://docs.pydantic.dev/latest/usage/models/). The method works with union types, optional types, nested schemas, arrays, everything. It is guaranteed that the output is parseable.

    I think it's cool, and I've spent a lot of time watching even tiny models output valid JSON over the weekend. Hope you will to.

    I look forward to feedback, bug reports, feature requests and discussions!

  • lmql

    A language for constraint-guided and efficient LLM programming.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • clownfish

    Constrained Decoding for LLMs against JSON Schema

  • I'm not sure how this is different than:

    https://github.com/1rgs/jsonformer

    or

    https://github.com/newhouseb/clownfish

    or

    https://github.com/mkuchnik/relm

    or

    https://github.com/ggerganov/llama.cpp/pull/1773

    or

    https://github.com/Shopify/torch-grammar

    Overall there are a ton of these logit based guidance systems, the reason they don't get tons of traction is the SOTA models are behind REST APIs that don't enable this fine-grained approach.

    Those models perform so much better that people generally settle for just re-requesting until they get the correct format (and with GPT-4 that ends up being a fairly rare occurrence in my experience)

  • relm

    ReLM is a Regular Expression engine for Language Models (by mkuchnik)

  • I'm not sure how this is different than:

    https://github.com/1rgs/jsonformer

    or

    https://github.com/newhouseb/clownfish

    or

    https://github.com/mkuchnik/relm

    or

    https://github.com/ggerganov/llama.cpp/pull/1773

    or

    https://github.com/Shopify/torch-grammar

    Overall there are a ton of these logit based guidance systems, the reason they don't get tons of traction is the SOTA models are behind REST APIs that don't enable this fine-grained approach.

    Those models perform so much better that people generally settle for just re-requesting until they get the correct format (and with GPT-4 that ends up being a fairly rare occurrence in my experience)

  • Constrained-Text-Generation-Studio

    Code repo for "Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio" at the (CAI2) workshop, jointly held at (COLING 2022)

  • TypeChat

    TypeChat is a library that makes it easy to build natural language interfaces using types.

  • That re-prompting error on is what this new Microsoft library does, too: https://github.com/microsoft/TypeChat

    Here's their prompt for that: https://github.com/microsoft/TypeChat/blob/c45460f4030938da3...

    I think the approach using grammars (seen here, but also in things like https://github.com/ggerganov/llama.cpp/pull/1773 ) is a much more elegant solution.

  • llm-mlc

    LLM plugin for running models using MLC

  • I'm quite impressed with Llama 2 13B - the more time I spend with it the more I think it might be genuinely useful for more than just playing around with local LLMs.

    I'm using the MLC version (since that works with a GPU on my M2 Mac) via my https://github.com/simonw/llm-mlc plugin.

  • flashtext

    Extract Keywords from sentence or Replace keywords in sentences.

  • I have some other comment on this thread where I point out why I don’t think it’s superficial. Would love to get your feedback on that if you feel like spending more time on this thread.

    But it’s not obscure? FlashText was a somewhat popular paper at the time (2017) with a popular repo (https://github.com/vi3k6i5/flashtext). Their paper was pretty derivative of Aho-Corasick, which they cited. If you think they genuinely fucked up, leave an issue on their repo (I’m, maybe to your surprise lol, not the author).

    Anyway, I’m not a fan of the whatabboutery here. I don’t think OG’s paper is up to snuff on its lit review - do you?

  • ad-llama

    Structured inference with Llama 2 in your browser

  • Generating an FSM over the vocabulary is a really interesting approach to guided sampling! I'm hacking on a structured inference library (https://github.com/gsuuon/ad-llama) - I also tried to add a vocab preprocessing step to generate a valid tokens mask (just with regex or static strings initially) but discovered that doing so would cause unlikely / unnatural tokens to be masked rather than the token which represents the natural encoding given the existing sampled tokens.

    Given the stateful nature of tokenizers, I decided that trying to preprocess the individual token ids was a losing battle. Even in the simple case of whitespace - tokenizer merges can really screw up generating a static mask, e.g. we expect a space next, but a token decodes to 'foo', but is actually a '_foo' and would've decoded with a whitespace if it were following a valid pair. When I go to construct the static vocab mask, it would then end up matching against 'foo' instead of ' foo'.

    How did you work around this for the FSM approach? Does it somehow include information about merges / whitespace / tokenizer statefulness?

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Prompt Engineering Guide

    1 project | news.ycombinator.com | 30 Mar 2024
  • FLaNK Stack Weekly 12 February 2024

    52 projects | dev.to | 12 Feb 2024
  • Resources to deepen LLMs understanding for software engineers

    1 project | news.ycombinator.com | 16 Jan 2024
  • Step-by-Step Guide to building an Anomaly Detector using a LLM

    1 project | dev.to | 10 Jan 2024
  • The Essential Guide to Prompt Engineering for Creators and Innovators

    2 projects | dev.to | 2 Jan 2024