Jsonformer: A bulletproof way to generate structured output from LLMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • jsonformer

    A Bulletproof Way to Generate Structured JSON from Language Models

  • clownfish

    Constrained Decoding for LLMs against JSON Schema

  • Oh nice! I built a similar system a few weeks ago: https://github.com/newhouseb/clownfish

    I think the main differentiating factor here is that this is better if you have a simpler JSON schema without enums or oneOf constraints. If you do have these constraints, i.e. let's say you wanted an array of different types that represented a items on a menu { kind: pizza, toppings: [pepperoni] } or { kind: ice_cream, flavor: vanilla | strawberry } then you would need something more sophisticated like clownfish that can ask the LLM to pick specific properties.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • cria

    Tiny inference-only implementation of LLaMA (by recmo)

  • Not op, but I can share my approach - I went line by line by Recmo's Cria: https://github.com/recmo/cria - which is an implementation of Llama in Numpy - so very low level. Took me I think 3-4 days x 10 hours + 1-2 days of reading about Transformers to understand what's going on - but from that you can see how models generate text and have a deep understanding of what's going on.

  • magic

    AI functions for Typescript (by jumploops)

  • I created a toy[0] in Typescript that maps LLM responses to type-safe output.

    It uses JSONSchema internally, but I’m thinking of revising it to just use Typescript directly after learning more about the ChatGPT plugin implementation (via their hackathon).

    [0]https://github.com/jumploops/magic

  • Chat-Markup-Language

    This is a Repo defining a set of rules for ChatGPT to use when sending responses to a user

  • I like the idea of getting ChatGPT to return something easily parse-able by a program. I've been using an XML derivative for that. https://github.com/ColinRyan/Chat-Markup-Language

    Never thought to use json schema. I'll check this out!

  • transmogrifier

    Unstructured data goes in, structured data comes out. Sometimes comedically.

  • This is a useful pattern and seems to be discovered by devs after they've played with LLMs for a while.

    I called it "transmogrifier" (thanks, Calvin!) and the maybe one interesting twist in my repo [1] is that you define the desired return type using Zod; when you call `transmogrify(...)` it validates the LLM response. If valid, data is returned (and you can use all the nice trappings of typescript from there); if not, an exception is raised.

    [1] https://github.com/davepeck/transmogrifier

  • relm

    ReLM is a Regular Expression engine for Language Models (by mkuchnik)

  • I have stumbled upon your repository a week ago and I have to say, great work and great ideas!

    Another thing I thought about is integrating formatting for fields using a similar system. ISO-8601 dates comes immediately to mind but also number and currency formatting are other examples.

    Probabilistic enums is another thing that I can think of that might be useful for fallback values, I am pretty sure there's a lot of work that can be done in this area, also for other parser kinds

    related and highly recommended resource is https://github.com/mkuchnik/relm and https://arxiv.org/abs/2211.15458. It is a similar system used to validate LLMs using regexes, however built for completely different use cases. I imagine integrating regex checks to the output fields can also have a lot of use cases.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • faker

    Generate massive amounts of fake data in the browser and node.js (by faker-js)

  • Something like this should be integrated with library like https://fakerjs.dev/

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts