guardrails vs TruthfulQA

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

guardrails		TruthfulQA
	Project
13	Mentions	4
3,284	Stars	502
9.8%	Growth	-
9.9	Activity	2.8
6 days ago	Latest Commit	6 months ago
Python	Language	Jupyter Notebook
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

guardrails

Posts with mentions or reviews of guardrails. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-10.

Guardrails AI
1 project | news.ycombinator.com | 30 Dec 2023
Does anyone have an example of a langchain based customer facing agent like a cashier/waitress?
1 project | /r/LangChain | 28 Jul 2023
Is there a UI that can limit LLM tokens to a preset list?
3 projects | /r/LocalLLaMA | 10 Jul 2023
A minimal design pattern for LLM-powered microservices with FastAPI & LangChain
4 projects | /r/LocalLLaMA | 13 Jun 2023

You're absolutely correct, and I agree that there's potentially a risk of quality loss. But likewise, since these are all intrinsically linked, it may be possible to leverage strength by combining these tasks. I'm unaware of a paper reviewing the reliability and/or performance of LLMs in this specific scenario. If you find any, do share :) With regards to generating JSON responses - there are simple ways to nudge the model and even validate it, using libraries such as https://github.com/promptslab/Promptify, https://github.com/eyurtsev/kor and https://github.com/ShreyaR/guardrails
Ask HN: People who were laid off or quit recently, how are you doing?
3 projects | news.ycombinator.com | 20 Apr 2023
Ask HN: AI to study my DSL and then output it?
5 projects | news.ycombinator.com | 19 Apr 2023

There are a couple different approaches:
- Use multi-shot prompting with something like guardrails to try prompting a commercial model until it works. [1]
- Use a local model with something with a final layer that steers token selection towards syntactically valid tokens [2]
[1] https://github.com/ShreyaR/guardrails
[2] "Structural Alignment: Modifying Transformers (like GPT) to Follow a JSON Schema" @ https://github.com/newhouseb/clownfish.
Introducing :🤖 Megabots - State-of-the-art, production ready full-stack LLM apps made mega-easy with LangChain and FastAPI
5 projects | /r/webdev | 19 Apr 2023

👍 validate and correct the outputs of LLMs using guardrails
For consistent output from vicuna 13b
1 project | /r/learnmachinelearning | 9 Apr 2023
[D] Is all the talk about what GPT can do on Twitter and Reddit exaggerated or fairly accurate?
3 projects | /r/MachineLearning | 6 Apr 2023

not vouching for it, but I know this is at least a thing that exists and I like the general idea: https://github.com/shreyar/guardrails
Introducing Agents in Haystack: Make LLMs resolve complex tasks
6 projects | news.ycombinator.com | 3 Apr 2023

TruthfulQA

Posts with mentions or reviews of TruthfulQA. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-04.

airoboros gpt-4 instructed + context-obedient question answering
3 projects | /r/LocalLLaMA | 4 Jun 2023

Dataset: https://github.com/sylinrl/TruthfulQA
Scaling Transformer to 1M tokens and beyond with RMT
6 projects | news.ycombinator.com | 23 Apr 2023

this is a great point.
do you know of any benchmarks doing this today?
given the acute need to evaluate models on contextual factuality, we're exploring how to create a benchmark for this purpose but prefer existing benchmarks if possible.
openai's truthfulqa[0] is close but does not focus on contextual factuality and targets a much harder problem of absolute truth.
if none exist, and people are interested in contributing, please reach out.
[0] https://github.com/sylinrl/TruthfulQA
[D] Is all the talk about what GPT can do on Twitter and Reddit exaggerated or fairly accurate?
3 projects | /r/MachineLearning | 6 Apr 2023

I agree they show that you can brute-force mimick uncertainty estimates to some degree, and that the model is generally well calibrated (though on what is basically a set of trivia questions, so YMMV)... yet:
[R] TruthfulQA: Measuring How Models Mimic Human Falsehoods
1 project | /r/MachineLearning | 8 Oct 2021

Code for https://arxiv.org/abs/2109.07958 found: https://github.com/sylinrl/TruthfulQA

What are some alternatives?

When comparing guardrails and TruthfulQA you can also consider the following projects:

lmql - A language for constraint-guided and efficient LLM programming.

safari - Convolutions for Sequence Modeling

GPTCache - Semantic cache for LLMs. Fully integrated with LangChain and llama_index.

recurrent-memory-transformer - [NeurIPS 22] [AAAI 24] Recurrent Transformer-based long-context architecture.

JARVIS - JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf

auto-evaluator

dynamic-gpt-ui - Dynamic UI generation with GPT-3 (OpenAI)

heinsen_routing - Reference implementation of "An Algorithm for Routing Vectors in Sequences" (Heinsen, 2022) and "An Algorithm for Routing Capsules in All Domains" (Heinsen, 2019), for composing deep neural networks.

truss - Assertions micro-library for Clojure/Script

ghostwheel - Hassle-free inline clojure.spec with semi-automatic generative testing and side effect detection

empirical-philosophy - A collection of empirical experiments using large language models and other neural network architectures to test the usefulness of metaphysical constructs.