guardrails
TruthfulQA
Our great sponsors
guardrails | TruthfulQA | |
---|---|---|
13 | 4 | |
3,284 | 502 | |
9.8% | - | |
9.9 | 2.8 | |
6 days ago | 6 months ago | |
Python | Jupyter Notebook | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
guardrails
- Guardrails AI
- Does anyone have an example of a langchain based customer facing agent like a cashier/waitress?
- Is there a UI that can limit LLM tokens to a preset list?
-
A minimal design pattern for LLM-powered microservices with FastAPI & LangChain
You're absolutely correct, and I agree that there's potentially a risk of quality loss. But likewise, since these are all intrinsically linked, it may be possible to leverage strength by combining these tasks. I'm unaware of a paper reviewing the reliability and/or performance of LLMs in this specific scenario. If you find any, do share :) With regards to generating JSON responses - there are simple ways to nudge the model and even validate it, using libraries such as https://github.com/promptslab/Promptify, https://github.com/eyurtsev/kor and https://github.com/ShreyaR/guardrails
- Ask HN: People who were laid off or quit recently, how are you doing?
-
Ask HN: AI to study my DSL and then output it?
There are a couple different approaches:
- Use multi-shot prompting with something like guardrails to try prompting a commercial model until it works. [1]
- Use a local model with something with a final layer that steers token selection towards syntactically valid tokens [2]
[1] https://github.com/ShreyaR/guardrails
[2] "Structural Alignment: Modifying Transformers (like GPT) to Follow a JSON Schema" @ https://github.com/newhouseb/clownfish.
-
Introducing :🤖 Megabots - State-of-the-art, production ready full-stack LLM apps made mega-easy with LangChain and FastAPI
👍 validate and correct the outputs of LLMs using guardrails
- For consistent output from vicuna 13b
-
[D] Is all the talk about what GPT can do on Twitter and Reddit exaggerated or fairly accurate?
not vouching for it, but I know this is at least a thing that exists and I like the general idea: https://github.com/shreyar/guardrails
- Introducing Agents in Haystack: Make LLMs resolve complex tasks
TruthfulQA
-
airoboros gpt-4 instructed + context-obedient question answering
Dataset: https://github.com/sylinrl/TruthfulQA
-
Scaling Transformer to 1M tokens and beyond with RMT
this is a great point.
do you know of any benchmarks doing this today?
given the acute need to evaluate models on contextual factuality, we're exploring how to create a benchmark for this purpose but prefer existing benchmarks if possible.
openai's truthfulqa[0] is close but does not focus on contextual factuality and targets a much harder problem of absolute truth.
if none exist, and people are interested in contributing, please reach out.
[0] https://github.com/sylinrl/TruthfulQA
-
[D] Is all the talk about what GPT can do on Twitter and Reddit exaggerated or fairly accurate?
I agree they show that you can brute-force mimick uncertainty estimates to some degree, and that the model is generally well calibrated (though on what is basically a set of trivia questions, so YMMV)... yet:
-
[R] TruthfulQA: Measuring How Models Mimic Human Falsehoods
Code for https://arxiv.org/abs/2109.07958 found: https://github.com/sylinrl/TruthfulQA
What are some alternatives?
lmql - A language for constraint-guided and efficient LLM programming.
safari - Convolutions for Sequence Modeling
GPTCache - Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
recurrent-memory-transformer - [NeurIPS 22] [AAAI 24] Recurrent Transformer-based long-context architecture.
JARVIS - JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
auto-evaluator
dynamic-gpt-ui - Dynamic UI generation with GPT-3 (OpenAI)
heinsen_routing - Reference implementation of "An Algorithm for Routing Vectors in Sequences" (Heinsen, 2022) and "An Algorithm for Routing Capsules in All Domains" (Heinsen, 2019), for composing deep neural networks.
truss - Assertions micro-library for Clojure/Script
ghostwheel - Hassle-free inline clojure.spec with semi-automatic generative testing and side effect detection
empirical-philosophy - A collection of empirical experiments using large language models and other neural network architectures to test the usefulness of metaphysical constructs.