Our great sponsors
-
vanna
🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
prql
PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
This is a good idea. I think what you'd want to do is override the generate_sql function and store the question with the "related" metadata and the generated sql somewhere:
https://github.com/vanna-ai/vanna/blob/main/src/vanna/base/b...
We're going to be adding a generic logging function soon and fairly soon what you're talking about could just be a custom logger.
Please add Ibis Birdbrain https://ibis-project.github.io/ibis-birdbrain/ to the list. Birdbrain is an AI-powered data bot, built on Ibis and Marvin, supporting more than 18 database backends.
See https://github.com/ibis-project/ibis and https://ibis-project.org for more details.
We have recently added support to query data from SingleStore to our agent framework, LLMStack (https://github.com/trypromptly/LLMStack). Out of the box performance performance when prompting with just the table schemas is pretty good with GPT-4.
The more domain specific knowledge needed for queries, the harder it has gotten in general. We've had good success `teaching` the model different concepts in relation to the dataset and giving it example questions and queries greatly improved performance.
https://prql-lang.org/ might be an answer for this. As a cross-database pipelined language, it would allow RAG to be intermixed with the query, and the syntax may(?) be more reliable to generate