lambdaprompt
Constrained-Text-Generation-Studio
lambdaprompt | Constrained-Text-Generation-Studio | |
---|---|---|
8 | 25 | |
368 | 197 | |
0.8% | - | |
5.6 | 4.1 | |
4 months ago | 9 months ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lambdaprompt
-
Ask HN: What have you built with LLMs?
We're using all sorts of different stacks and tooling. We made our own tooling at one point (https://github.com/approximatelabs/lambdaprompt/), but have more recently switched to just using the raw requests ourselves and writing out the logic ourselves in the product. For our main product, the code just lives in our next app, and deploys on vercel.
-
RasaGPT: First headless LLM chatbot built on top of Rasa, Langchain and FastAPI
https://github.com/approximatelabs/lambdaprompt It has served all of my personal use-cases since making it, including powering `sketch` (copilot for pandas) https://github.com/approximatelabs/sketch
Core things it does: Uses jinja templates, does sync and async, and most importantly treats LLM completion endpoints as "function calls", which you can compose and build structures around just with simple python. I also combined it with fastapi so you can just serve up any templates you want directly as rest endpoints. It also offers callback hooks so you can log & trace execution graphs.
All together its only ~600 lines of python.
I haven't had a chance to really push all the different examples out there, but most "complex behaviors", so there aren't many patterns to copy. But if you're comfortable in python, then I think it offers a pretty good interface.
I hope to get back to it sometime in the next week to introduce local-mode (eg. all the open source smaller models are now available, I want to make those first-class)
-
Replacing a SQL analyst with 26 recursive GPT prompts
This is great~ There's been some really rapid progress on Text2SQL in the last 6 months, and I really thinking this will have a real impact on the modern data stack ecosystem!
I had similar success with lambdaprompt for solving Text2SQL (https://github.com/approximatelabs/lambdaprompt/)
- λprompt - Composing Ai prompts with python in a functional style
-
LangChain: Build AI apps with LLMs through composability
This is great! I love seeing how rapidly in the past 6 months these ideas are evolving. I've been internally calling these systems "prompt machines". I'm a strong believer that chaining together language model prompts is core to extracting real, and reproducible value from language models. I sometimes even wonder if systems like this are the path to AGI as well, and spent a full month 'stuck' on that hypothesis in October.
Specific to prompt-chaining: I've spent a lot of time ideating about where "prompts live" (are they best as API endpoint, as cli programs, as machines with internal state, treated as a single 'assembly instruction' -- where do "prompts" live naturally) and eventually decided on them being the most synonymous with functions (and api endpoints via the RPC concept)
mental model I've developed (sharing in case it resonates with anyone else)
a "chain" is `a = 'text'; b = p1(a); c = p2(b)` where p1 and p2 are LLM prompts.
What comes next (in my opinion) is other programming constructs: loops, conditionals, variables (memory), etc. (I think LangChain represents some of these concepts as their "areas" -> chain (function chaining), agents (loops), memory (variables))
To offer this code-style interface on top of LLMs, I made something similar to LangChain, but scoped what i made to only focus on the bare functional interface and the concept of a "prompt function", and leave the power of the "execution flow" up to the language interpreter itself (in this case python) so the user can make anything with it.
https://github.com/approximatelabs/lambdaprompt
I've had so much fun recently just playing with prompt chaining in general, it feels like the "new toy" in the AI space (orders of magnitude more fun than dall-e or chat-gpt for me). (I built sketch (posted the other day on HN) based on lambdaprompt)
My favorites have been things to test the inherent behaviors of language models using iterated prompts. I spent some time looking for "fractal" like behavior inside the functions, hoping that if I got the right starting point, an iterated function would avoid fixed points --> this has eluded me so far, so if anyone finds non-fixed points in LLMs, please let me know!
I'm a believer that the "next revolution" in machine-written code and behavior from LLMs will come when someone can tame LLM prompting to self-write prompt chains themselves (whether that is on lambdaprompt, langchain, or something else!)
All in all, I'm super hyped about LangChain, love the space they are in and the rapid attention they are getting~
-
Show HN: Sketch – AI code-writing assistant that understands data content
From https://github.com/approximatelabs/sketch/blob/main/sketch/p... it appears that this library is calling a remote API, which obviates the utility of the demonstrated use case.
Upon closer inspection, it looks like https://github.com/approximatelabs/sketch interfaces with the model via https://github.com/approximatelabs/lambdaprompt, which is made by the same organization. This suggests to me that the former may be a toy demonstration of the latter.
- Show HN: Prompt – Build, compose and call templated LLM prompts
Constrained-Text-Generation-Studio
-
Photoshop for Text (2022)
Oh my god. I wrote a whole library called "Constrained Text Generation Studio" where I mused that I wanted a "Photoshop for Text". I'm not even sure which work predates the other: https://github.com/Hellisotherpeople/Constrained-Text-Genera...
The core idea of a "photoshop for text", specifically a word processor made for prosumers supporting GenAI first class (i.e oobabooga but actually good) - is worth so much. If you're a VC reading this, chances are I want to talk to you to actually execute on the idea from the OP
-
Ask HN: What have you built with LLMs?
I was working on this stuff before it was cool, so in the sense of the precursor to LLMs (and sometimes supporting LLMs still) I've built many things:
1. Games you can play with word2vec or related models (could be drop in replaced with sentence transformer). It's crazy that this is 5 years old now: https://github.com/Hellisotherpeople/Language-games
2. "Constrained Text Generation Studio" - A research project I wrote when I was trying to solve LLM's inability to follow syntactic, phonetic, or semantic constraints: https://github.com/Hellisotherpeople/Constrained-Text-Genera...
3. DebateKG - A bunch of "Semantic Knowledge Graphs" built on my pet debate evidence dataset (LLM backed embeddings indexes synchronized with a graphDB and a sqlDB via txtai). Can create compelling policy debate cases https://github.com/Hellisotherpeople/DebateKG
4. My failed attempt at a good extractive summarizer. My life work is dedicated to one day solving the problems I tried to fix with this project: https://github.com/Hellisotherpeople/CX_DB8
-
You need a mental model of LLMs to build or use a LLM-based product
My mental model for LLMs was built by carefully studying the distribution of its output vocabulary at every time step.
There are tools that allow you to right click and see all possible continuations for an LLM like you would in a code IDE[1]. Seeing what this vocabulary is[2] and how trivial modifications to the prompt can impact probabilities will do a lot for improving the mental model of how LLM operate.
Shameless self plug, but software which can do what I am describing is here, and it's worth noting that it ended up as peer reviewed research.
[1] https://github.com/Hellisotherpeople/Constrained-Text-Genera...
-
Ask HN: How training of LLM dedicated to code is different from LLM of “text”
Yeah, the LLM outputs a distribution of likely next tokens. It is up to the decoder to select one, and it can use a grammar to enforce certain rules on the output. https://github.com/Hellisotherpeople/Constrained-Text-Genera... or https://github.com/ggerganov/llama.cpp/blob/master/grammars/... for example.
- Show HN: LLMs can generate valid JSON 100% of the time
-
Llama: Add Grammar-Based Sampling
I am in love with this, I tried my hand at building a Constrained Text Generation Studio (https://github.com/Hellisotherpeople/Constrained-Text-Genera...), and got published at COLING 2022 for my paper on it (https://paperswithcode.com/paper/most-language-models-can-be...), but I always knew that something like this or the related idea enumerated in this paper: https://arxiv.org/abs/2306.03081 was the way to go.
-
LLMs are too easy to automatically red team into toxicity
It's far too easy to destroy any type of RLHF done to try to prevent bad behavior from an LLM.
For example, if you want a LLM to generate things that look like social security numbers, you may try to prompt it asking for social security numbers. It will of course give you "I'm sorry hal I can't do that..."
Then start using a technique like token filtering/filter assisted decoding, to make it where the LLM can only generate hyphens and numbers, and suddenly it does what you ask despite RLHF
I explored this a tiny bit in the later sections of my paper studying what happens when you restrict an LLMs vocabulary: https://aclanthology.org/2022.cai-1.pdf#page=17
You can even play with this with open source models using CTGS: https://github.com/Hellisotherpeople/Constrained-Text-Genera...
-
Understanding GPT Tokenizers
I agree with you, and I'm SHOCKED at how little work there actually is in phonetics within the NLP community. Consider that most of the phonetic tools that I am using to enforce rhyming or similar syntactic constrained in constrained text generation studio (https://github.com/Hellisotherpeople/Constrained-Text-Genera...) were built circa 2014, such as the CMU rhyming dictionary. In most cases, I could not find better modern implementations of these tools.
I did learn an awful lot about phonetic representations and matching algorithms. Things like "soundex" and "double metaphone" now make sense to me and are fascinating to read about.
-
Don Knuth Plays with ChatGPT
https://github.com/hellisotherpeople/constrained-text-genera...
Just ban the damn tokens and try again. I wish that folks had more intuition around tokenization, and why LLMs struggle to follow syntactic, lexical, or phonetic constraints.
- Constrained Text Generation Studio
What are some alternatives?
datasloth - Natural language Pandas queries and data generation powered by GPT-3
Constrained-Text-Genera
lmql - A language for constraint-guided and efficient LLM programming.
guidance - A guidance language for controlling large language models.
LiteratureReviewBot - Experiment to use GPT-3 to help write grant proposals.
torch-grammar
kor - LLM(😽)
agency - Agency: Robust LLM Agent Management with Go
olympe - Query your database in plain english
llama-tokenizer-js - JS tokenizer for LLaMA and LLaMA 2
com2fun - Transform document into function.
outlines - Structured Text Generation