Jsonformer: A bulletproof way to generate structured output from LLMs

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

jsonformer

25 3,774 5.4 Jupyter Notebook

A Bulletproof Way to Generate Structured JSON from Language Models
clownfish

11 298 4.3 Python

Constrained Decoding for LLMs against JSON Schema

Oh nice! I built a similar system a few weeks ago: https://github.com/newhouseb/clownfish
I think the main differentiating factor here is that this is better if you have a simpler JSON schema without enums or oneOf constraints. If you do have these constraints, i.e. let's say you wanted an array of different types that represented a items on a menu { kind: pizza, toppings: [pepperoni] } or { kind: ice_cream, flavor: vanilla | strawberry } then you would need something more sophisticated like clownfish that can ask the LLM to pick specific properties.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
cria

3 77 2.5 Python

Tiny inference-only implementation of LLaMA (by recmo)

Not op, but I can share my approach - I went line by line by Recmo's Cria: https://github.com/recmo/cria - which is an implementation of Llama in Numpy - so very low level. Took me I think 3-4 days x 10 hours + 1-2 days of reading about Transformers to understand what's going on - but from that you can see how models generate text and have a deep understanding of what's going on.

magic

5 73 5.4 TypeScript

AI functions for Typescript (by jumploops)

I created a toy[0] in Typescript that maps LLM responses to type-safe output.
It uses JSONSchema internally, but I’m thinking of revising it to just use Typescript directly after learning more about the ChatGPT plugin implementation (via their hackathon).
[0]https://github.com/jumploops/magic

Chat-Markup-Language

1 2 4.3

This is a Repo defining a set of rules for ChatGPT to use when sending responses to a user

I like the idea of getting ChatGPT to return something easily parse-able by a program. I've been using an XML derivative for that. https://github.com/ColinRyan/Chat-Markup-Language
Never thought to use json schema. I'll check this out!

transmogrifier

1 0 4.7 TypeScript

Unstructured data goes in, structured data comes out. Sometimes comedically.

This is a useful pattern and seems to be discovered by devs after they've played with LLMs for a while.
I called it "transmogrifier" (thanks, Calvin!) and the maybe one interesting twist in my repo [1] is that you define the desired return type using Zod; when you call `transmogrify(...)` it validates the LLM response. If valid, data is returned (and you can use all the nice trappings of typescript from there); if not, an exception is raised.
[1] https://github.com/davepeck/transmogrifier

relm

3 84 5.1 Python

ReLM is a Regular Expression engine for Language Models (by mkuchnik)

I have stumbled upon your repository a week ago and I have to say, great work and great ideas!
Another thing I thought about is integrating formatting for fields using a similar system. ISO-8601 dates comes immediately to mind but also number and currency formatting are other examples.
Probabilistic enums is another thing that I can think of that might be useful for fallback values, I am pretty sure there's a lot of work that can be done in this area, also for other parser kinds
related and highly recommended resource is https://github.com/mkuchnik/relm and https://arxiv.org/abs/2211.15458. It is a similar system used to validate LLMs using regexes, however built for completely different use cases. I imagine integrating regex checks to the output fields can also have a lot of use cases.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
faker

58 11,732 9.7 TypeScript

Generate massive amounts of fake data in the browser and node.js (by faker-js)

Something like this should be integrated with library like https://fakerjs.dev/

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project