Show HN: Dataherald AI – Natural Language to SQL Engine

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

dataherald

4 1,146 9.3 Python

Interact with your SQL database, Natural Language to SQL using LLMs
spider

10 715 0.0 Python

scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge

Hi HN community. We are excited to open source Dataherald’s natural-language-to-SQL engine today (https://github.com/Dataherald/dataherald). This engine allows you to set up an API from your structured database that can answer questions in plain English.
GPT-4 class LLMs have gotten remarkably good at writing SQL. However, out-of-the-box LLMs and existing frameworks would not work with our own structured data at a necessary quality level. For example, given the question “what was the average rent in Los Angeles in May 2023?” a reasonable human would either assume the question is about Los Angeles, CA or would confirm the state with the question asker in a follow up. However, an LLM translates this to:
select price from rent_prices where city=”Los Angeles” AND month=”05” AND year=”2023”
This pulls data for Los Angeles, CA and Los Angeles, TX without getting columns to differentiate between the two. You can read more about the challenges of enterprise-level text-to-SQL in this blog post I wrote on the topic: https://medium.com/dataherald/why-enterprise-natural-languag...
Dataherald comes with “batteries-included.” It has best-in-class implementations of core components, including, but not limited to: a state of the art NL-to-SQL agent, an LLM-based SQL-accuracy evaluator. The architecture is modular, allowing these components to be easily replaced. It’s easy to set up and use with major data warehouses.
There is a “Context Store” where information (NL2SQL examples, schemas and table descriptions) is used for the LLM prompts to make the engine get better with usage. And we even made it fast!
This version allows you to easily connect to PG, Databricks, BigQuery or Snowflake and set up an API for semantic interactions with your structured data. You can then add business and data context that are used for few-shot prompting by the engine.
The NL-to-SQL agent in this open source release was developed by our own Mohammadreza Pourreza, whose DIN-SQL algorithm is currently top of the Spider (https://yale-lily.github.io/spider) and Bird (https://bird-bench.github.io/) NL 2 SQL benchmarks. This agent has outperformed the Langchain SQLAgent anywhere from 12%-250%.5x (depending on the provided context) in our own internal benchmarking while being only ~15s slower on average.
Needless to say, this is an early release and the codebase is under swift development. We would love for you to try it out and give us your feedback! And if you are interested in contributing, we’d love to hear from you!

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
LLMStack

20 1,112 9.9 Python

No-code platform to build LLM Agents, workflows and applications with your data

We were using OpenAI's codex model with a bunch of few shot examples for natural language queries over blockchain data coupled with a finetuned completions model for plot the results when we were working on makerdojo.io.
Without the ability to fine tune Codex, we had to employ a bunch of techniques to pick the right set of examples to provide in the context to get good SQL. We have come very far since then and it has been great to see projects projects like sqlcoder and this.
Pieces of the pipeline powering makerdojo eventually became LLMStack (https://github.com/trypromptly/LLMStack). We are looking to integrate one of these SQL generators as processors in LLMStack so we can build higher level applications.

sqlcoder

2 2,734 8.6 Jupyter Notebook

SoTA LLM for converting natural language questions to SQL queries

There are pretty large NL -> SQL datasets (e.g. this ~80K sample dataset: https://huggingface.co/datasets/b-mc2/sql-create-context). An OSS model based on StarCoder was also recently published which is roughly between GPT-3.5 and GPT-4: https://github.com/defog-ai/sqlcoder

ada

5 28 8.7 Python

Accelerate your data analysis with AI. (by BenderV)

I switched to chat version, named [Ada](https://github.com/BenderV/ada), and IMHO, it's better. The AI explore the database, it's connection, the data format & co. Plus, it help with ambiguity and feels more "natural".

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

We built a self-hosted low-code platform to build LLM apps locally and open-sourced it

1 project | /r/OpenAI | 3 Sep 2023
LLMStack: self-hosted low-code platform to build LLM apps locally with LocalAI support

1 project | /r/selfhosted | 3 Sep 2023
LLMStack: a self-hosted low-code platform to build LLM apps locally

1 project | /r/programming | 1 Sep 2023
Teaching with AI

2 projects | news.ycombinator.com | 31 Aug 2023
Show HN: LLMStack – Self-Hosted, Low-Code Platform to Build AI Experiences

1 project | news.ycombinator.com | 31 Aug 2023

Show HN: Dataherald AI – Natural Language to SQL Engine

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
AI generative-ai llm-chain llm-framework llmops
Post date: 23 Aug 2023

dataherald

spider

InfluxDB

LLMStack

sqlcoder

ada

SaaSHub

Related posts

We built a self-hosted low-code platform to build LLM apps locally and open-sourced it

LLMStack: self-hosted low-code platform to build LLM apps locally with LocalAI support

LLMStack: a self-hosted low-code platform to build LLM apps locally

Teaching with AI

Show HN: LLMStack – Self-Hosted, Low-Code Platform to Build AI Experiences

Show HN: Dataherald AI – Natural Language to SQL Engine

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com AI generative-ai llm-chain llm-framework llmops Post date: 23 Aug 2023

dataherald

spider

InfluxDB

LLMStack

sqlcoder

ada

SaaSHub

Related posts

We built a self-hosted low-code platform to build LLM apps locally and open-sourced it

LLMStack: self-hosted low-code platform to build LLM apps locally with LocalAI support

LLMStack: a self-hosted low-code platform to build LLM apps locally

Teaching with AI

Show HN: LLMStack – Self-Hosted, Low-Code Platform to Build AI Experiences

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
AI generative-ai llm-chain llm-framework llmops
Post date: 23 Aug 2023