Show HN: LLMs can generate valid JSON 100% of the time

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • llama.cpp

    LLM inference in C/C++

    I may get heavily downvoted for my criticism here, but here we go again: yet another "innovation" that's fueled by the stupid money poured into AI startups in the past 2 years. Imagine thinking that adding regex on top of an LLM is worth $8.5M[1]. At least Llama's grammar-based sampling[2] is a bit more interesting but still essentially putting lipstick on a pig.

    How is telling the language model "no, not like that, give me another token" at every step of token inference getting so many people ecstatic? The paper is basically undergrad-level excitement about something not even remotely interesting. Congratulations, you reinvented Markov chains (oh, sorry, "state machines") on top of LLMs.

    I mean of course you can guarantee grammar and schema well-formedness as, duh, you have what essentially amounts to a post-processing step. Maybe I'm the idiot here, is anyone actually using any of these tools in production?

    [1] https://www.benzinga.com/pressreleases/23/06/n32834246/norma...

    [2] https://github.com/ggerganov/llama.cpp/pull/1773/files

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • outlines

    Structured Text Generation

    We can extend our approach to grammar-based sampling, as explained in the paper linked above. Relevant PR: https://github.com/normal-computing/outlines/pull/178

    Our method is much more efficient. llama.cpp loops over the entire vocabulary (~50k tokens) at each step to generate the mask. We generate an index at initialization, and building the masks at each step only requires a dictionary lookup (trade speed for memory). Sampling is just as fast as standard sampling.

  • guidance

    A guidance language for controlling large language models.

    OpenAI has this capability built in with functions[0], I believe! Building my own project[1] I have implemented functions in combination with guidance[2] and haven’t had a hiccup yet! I have a JSON parser function there, just in case, but it seems to be working reliably!

    Here’s a bit more of a description of using the functions API for JSON returns: https://yonom.substack.com/p/native-json-output-from-gpt-4

    [0] https://openai.com/blog/function-calling-and-other-api-updat...

    [1] https://resgen.app

    [2] https://github.com/guidance-ai/guidance

  • jsonformer

    A Bulletproof Way to Generate Structured JSON from Language Models

    I'm not sure how this is different than:

    https://github.com/1rgs/jsonformer

    or

    https://github.com/newhouseb/clownfish

    or

    https://github.com/mkuchnik/relm

    or

    https://github.com/ggerganov/llama.cpp/pull/1773

    or

    https://github.com/Shopify/torch-grammar

    Overall there are a ton of these logit based guidance systems, the reason they don't get tons of traction is the SOTA models are behind REST APIs that don't enable this fine-grained approach.

    Those models perform so much better that people generally settle for just re-requesting until they get the correct format (and with GPT-4 that ends up being a fairly rare occurrence in my experience)

  • torch-grammar

    I'm not sure how this is different than:

    https://github.com/1rgs/jsonformer

    or

    https://github.com/newhouseb/clownfish

    or

    https://github.com/mkuchnik/relm

    or

    https://github.com/ggerganov/llama.cpp/pull/1773

    or

    https://github.com/Shopify/torch-grammar

    Overall there are a ton of these logit based guidance systems, the reason they don't get tons of traction is the SOTA models are behind REST APIs that don't enable this fine-grained approach.

    Those models perform so much better that people generally settle for just re-requesting until they get the correct format (and with GPT-4 that ends up being a fairly rare occurrence in my experience)

  • json-schema-spec

    The JSON Schema specification

  • lmql

    A language for constraint-guided and efficient LLM programming.

  • clownfish

    Constrained Decoding for LLMs against JSON Schema

    I'm not sure how this is different than:

    https://github.com/1rgs/jsonformer

    or

    https://github.com/newhouseb/clownfish

    or

    https://github.com/mkuchnik/relm

    or

    https://github.com/ggerganov/llama.cpp/pull/1773

    or

    https://github.com/Shopify/torch-grammar

    Overall there are a ton of these logit based guidance systems, the reason they don't get tons of traction is the SOTA models are behind REST APIs that don't enable this fine-grained approach.

    Those models perform so much better that people generally settle for just re-requesting until they get the correct format (and with GPT-4 that ends up being a fairly rare occurrence in my experience)

  • relm

    ReLM is a Regular Expression engine for Language Models (by mkuchnik)

    I'm not sure how this is different than:

    https://github.com/1rgs/jsonformer

    or

    https://github.com/newhouseb/clownfish

    or

    https://github.com/mkuchnik/relm

    or

    https://github.com/ggerganov/llama.cpp/pull/1773

    or

    https://github.com/Shopify/torch-grammar

    Overall there are a ton of these logit based guidance systems, the reason they don't get tons of traction is the SOTA models are behind REST APIs that don't enable this fine-grained approach.

    Those models perform so much better that people generally settle for just re-requesting until they get the correct format (and with GPT-4 that ends up being a fairly rare occurrence in my experience)

  • Constrained-Text-Generation-Studio

    Code repo for "Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio" at the (CAI2) workshop, jointly held at (COLING 2022)

  • TypeChat

    TypeChat is a library that makes it easy to build natural language interfaces using types.

    That re-prompting error on is what this new Microsoft library does, too: https://github.com/microsoft/TypeChat

    Here's their prompt for that: https://github.com/microsoft/TypeChat/blob/c45460f4030938da3...

    I think the approach using grammars (seen here, but also in things like https://github.com/ggerganov/llama.cpp/pull/1773 ) is a much more elegant solution.

  • llm-mlc

    LLM plugin for running models using MLC

    I'm quite impressed with Llama 2 13B - the more time I spend with it the more I think it might be genuinely useful for more than just playing around with local LLMs.

    I'm using the MLC version (since that works with a GPU on my M2 Mac) via my https://github.com/simonw/llm-mlc plugin.

  • flashtext

    Extract Keywords from sentence or Replace keywords in sentences.

    I have some other comment on this thread where I point out why I don’t think it’s superficial. Would love to get your feedback on that if you feel like spending more time on this thread.

    But it’s not obscure? FlashText was a somewhat popular paper at the time (2017) with a popular repo (https://github.com/vi3k6i5/flashtext). Their paper was pretty derivative of Aho-Corasick, which they cited. If you think they genuinely fucked up, leave an issue on their repo (I’m, maybe to your surprise lol, not the author).

    Anyway, I’m not a fan of the whatabboutery here. I don’t think OG’s paper is up to snuff on its lit review - do you?

  • ad-llama

    Structured inference with Llama 2 in your browser

    Generating an FSM over the vocabulary is a really interesting approach to guided sampling! I'm hacking on a structured inference library (https://github.com/gsuuon/ad-llama) - I also tried to add a vocab preprocessing step to generate a valid tokens mask (just with regex or static strings initially) but discovered that doing so would cause unlikely / unnatural tokens to be masked rather than the token which represents the natural encoding given the existing sampled tokens.

    Given the stateful nature of tokenizers, I decided that trying to preprocess the individual token ids was a losing battle. Even in the simple case of whitespace - tokenizer merges can really screw up generating a static mask, e.g. we expect a space next, but a token decodes to 'foo', but is actually a '_foo' and would've decoded with a whitespace if it were following a valid pair. When I go to construct the static vocab mask, it would then end up matching against 'foo' instead of ' foo'.

    How did you work around this for the FSM approach? Does it somehow include information about merges / whitespace / tokenizer statefulness?

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Getting Started as a Prompt Engineer

    2 projects | dev.to | 10 Nov 2024
  • Four Courses that helped me to get into Gen AI

    1 project | dev.to | 25 Jun 2024
  • Laravel RAG System in 4 Steps!

    4 projects | dev.to | 24 Jun 2024
  • Top Open Source Prompt Engineering Guides & Tools🔧🏗️🚀

    5 projects | dev.to | 2 May 2024
  • Prompt Engineering Guide

    1 project | news.ycombinator.com | 30 Mar 2024