-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Constrained-Text-Generation-Studio
Code repo for "Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio" at the (CAI2) workshop, jointly held at (COLING 2022)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
I may get heavily downvoted for my criticism here, but here we go again: yet another "innovation" that's fueled by the stupid money poured into AI startups in the past 2 years. Imagine thinking that adding regex on top of an LLM is worth $8.5M[1]. At least Llama's grammar-based sampling[2] is a bit more interesting but still essentially putting lipstick on a pig.
How is telling the language model "no, not like that, give me another token" at every step of token inference getting so many people ecstatic? The paper is basically undergrad-level excitement about something not even remotely interesting. Congratulations, you reinvented Markov chains (oh, sorry, "state machines") on top of LLMs.
I mean of course you can guarantee grammar and schema well-formedness as, duh, you have what essentially amounts to a post-processing step. Maybe I'm the idiot here, is anyone actually using any of these tools in production?
[1] https://www.benzinga.com/pressreleases/23/06/n32834246/norma...
[2] https://github.com/ggerganov/llama.cpp/pull/1773/files
We can extend our approach to grammar-based sampling, as explained in the paper linked above. Relevant PR: https://github.com/normal-computing/outlines/pull/178
Our method is much more efficient. llama.cpp loops over the entire vocabulary (~50k tokens) at each step to generate the mask. We generate an index at initialization, and building the masks at each step only requires a dictionary lookup (trade speed for memory). Sampling is just as fast as standard sampling.
OpenAI has this capability built in with functions[0], I believe! Building my own project[1] I have implemented functions in combination with guidance[2] and haven’t had a hiccup yet! I have a JSON parser function there, just in case, but it seems to be working reliably!
Here’s a bit more of a description of using the functions API for JSON returns: https://yonom.substack.com/p/native-json-output-from-gpt-4
[0] https://openai.com/blog/function-calling-and-other-api-updat...
[1] https://resgen.app
[2] https://github.com/guidance-ai/guidance
I'm not sure how this is different than:
https://github.com/1rgs/jsonformer
or
https://github.com/newhouseb/clownfish
or
https://github.com/mkuchnik/relm
or
https://github.com/ggerganov/llama.cpp/pull/1773
or
https://github.com/Shopify/torch-grammar
Overall there are a ton of these logit based guidance systems, the reason they don't get tons of traction is the SOTA models are behind REST APIs that don't enable this fine-grained approach.
Those models perform so much better that people generally settle for just re-requesting until they get the correct format (and with GPT-4 that ends up being a fairly rare occurrence in my experience)
I'm not sure how this is different than:
https://github.com/1rgs/jsonformer
or
https://github.com/newhouseb/clownfish
or
https://github.com/mkuchnik/relm
or
https://github.com/ggerganov/llama.cpp/pull/1773
or
https://github.com/Shopify/torch-grammar
Overall there are a ton of these logit based guidance systems, the reason they don't get tons of traction is the SOTA models are behind REST APIs that don't enable this fine-grained approach.
Those models perform so much better that people generally settle for just re-requesting until they get the correct format (and with GPT-4 that ends up being a fairly rare occurrence in my experience)
Outlines is a Python library that focuses on text generation with large language models. Brandon and I are not LLM experts and started the project a few months ago because we wanted to understand better how the generation process works. Our original background is probabilistic, relational and symbolic programming.
Recently we came up with a fast way to generate text that matches a regex (https://blog.normalcomputing.ai/posts/2023-07-27-regex-guide...). The basic idea is simple: regular expressions have an equivalent Deterministic-Finite Automaton (DFA) representation. We can transform this DFA into a generative model: in each state we get a list of symbols which correspond to completions that partially match the regular expression. We mask the other symbols in the logits returned by a large language model, sample a new symbol and move to the next state. The subtelty is that language models work with tokens, not symbols, so we derive a new FSM whose alphabet is the model's vocabulary. We can do this in only one pass over the vocabulary.
Generating the token masks thus only requires a dictionary lookup at each state. Our method blows other libraries like Microsoft's guidance out of the water.
From there it was only a small leap to be able to generate text that follows a JSON schema (https://json-schema.org/), or is parseable into a Pydantic model (https://docs.pydantic.dev/latest/usage/models/). The method works with union types, optional types, nested schemas, arrays, everything. It is guaranteed that the output is parseable.
I think it's cool, and I've spent a lot of time watching even tiny models output valid JSON over the weekend. Hope you will to.
I look forward to feedback, bug reports, feature requests and discussions!
I'm not sure how this is different than:
https://github.com/1rgs/jsonformer
or
https://github.com/newhouseb/clownfish
or
https://github.com/mkuchnik/relm
or
https://github.com/ggerganov/llama.cpp/pull/1773
or
https://github.com/Shopify/torch-grammar
Overall there are a ton of these logit based guidance systems, the reason they don't get tons of traction is the SOTA models are behind REST APIs that don't enable this fine-grained approach.
Those models perform so much better that people generally settle for just re-requesting until they get the correct format (and with GPT-4 that ends up being a fairly rare occurrence in my experience)
I'm not sure how this is different than:
https://github.com/1rgs/jsonformer
or
https://github.com/newhouseb/clownfish
or
https://github.com/mkuchnik/relm
or
https://github.com/ggerganov/llama.cpp/pull/1773
or
https://github.com/Shopify/torch-grammar
Overall there are a ton of these logit based guidance systems, the reason they don't get tons of traction is the SOTA models are behind REST APIs that don't enable this fine-grained approach.
Those models perform so much better that people generally settle for just re-requesting until they get the correct format (and with GPT-4 that ends up being a fairly rare occurrence in my experience)
That re-prompting error on is what this new Microsoft library does, too: https://github.com/microsoft/TypeChat
Here's their prompt for that: https://github.com/microsoft/TypeChat/blob/c45460f4030938da3...
I think the approach using grammars (seen here, but also in things like https://github.com/ggerganov/llama.cpp/pull/1773 ) is a much more elegant solution.
I'm quite impressed with Llama 2 13B - the more time I spend with it the more I think it might be genuinely useful for more than just playing around with local LLMs.
I'm using the MLC version (since that works with a GPU on my M2 Mac) via my https://github.com/simonw/llm-mlc plugin.
I have some other comment on this thread where I point out why I don’t think it’s superficial. Would love to get your feedback on that if you feel like spending more time on this thread.
But it’s not obscure? FlashText was a somewhat popular paper at the time (2017) with a popular repo (https://github.com/vi3k6i5/flashtext). Their paper was pretty derivative of Aho-Corasick, which they cited. If you think they genuinely fucked up, leave an issue on their repo (I’m, maybe to your surprise lol, not the author).
Anyway, I’m not a fan of the whatabboutery here. I don’t think OG’s paper is up to snuff on its lit review - do you?
Generating an FSM over the vocabulary is a really interesting approach to guided sampling! I'm hacking on a structured inference library (https://github.com/gsuuon/ad-llama) - I also tried to add a vocab preprocessing step to generate a valid tokens mask (just with regex or static strings initially) but discovered that doing so would cause unlikely / unnatural tokens to be masked rather than the token which represents the natural encoding given the existing sampled tokens.
Given the stateful nature of tokenizers, I decided that trying to preprocess the individual token ids was a losing battle. Even in the simple case of whitespace - tokenizer merges can really screw up generating a static mask, e.g. we expect a space next, but a token decodes to 'foo', but is actually a '_foo' and would've decoded with a whitespace if it were following a valid pair. When I go to construct the static vocab mask, it would then end up matching against 'foo' instead of ' foo'.
How did you work around this for the FSM approach? Does it somehow include information about merges / whitespace / tokenizer statefulness?