Show HN: Dataherald AI – Natural Language to SQL Engine

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • dataherald

    Interact with your SQL database, Natural Language to SQL using LLMs

  • spider

    scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge

  • Hi HN community. We are excited to open source Dataherald’s natural-language-to-SQL engine today (https://github.com/Dataherald/dataherald). This engine allows you to set up an API from your structured database that can answer questions in plain English.

    GPT-4 class LLMs have gotten remarkably good at writing SQL. However, out-of-the-box LLMs and existing frameworks would not work with our own structured data at a necessary quality level. For example, given the question “what was the average rent in Los Angeles in May 2023?” a reasonable human would either assume the question is about Los Angeles, CA or would confirm the state with the question asker in a follow up. However, an LLM translates this to:

    select price from rent_prices where city=”Los Angeles” AND month=”05” AND year=”2023”

    This pulls data for Los Angeles, CA and Los Angeles, TX without getting columns to differentiate between the two. You can read more about the challenges of enterprise-level text-to-SQL in this blog post I wrote on the topic: https://medium.com/dataherald/why-enterprise-natural-languag...

    Dataherald comes with “batteries-included.” It has best-in-class implementations of core components, including, but not limited to: a state of the art NL-to-SQL agent, an LLM-based SQL-accuracy evaluator. The architecture is modular, allowing these components to be easily replaced. It’s easy to set up and use with major data warehouses.

    There is a “Context Store” where information (NL2SQL examples, schemas and table descriptions) is used for the LLM prompts to make the engine get better with usage. And we even made it fast!

    This version allows you to easily connect to PG, Databricks, BigQuery or Snowflake and set up an API for semantic interactions with your structured data. You can then add business and data context that are used for few-shot prompting by the engine.

    The NL-to-SQL agent in this open source release was developed by our own Mohammadreza Pourreza, whose DIN-SQL algorithm is currently top of the Spider (https://yale-lily.github.io/spider) and Bird (https://bird-bench.github.io/) NL 2 SQL benchmarks. This agent has outperformed the Langchain SQLAgent anywhere from 12%-250%.5x (depending on the provided context) in our own internal benchmarking while being only ~15s slower on average.

    Needless to say, this is an early release and the codebase is under swift development. We would love for you to try it out and give us your feedback! And if you are interested in contributing, we’d love to hear from you!

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • LLMStack

    No-code platform to build LLM Agents, workflows and applications with your data

  • We were using OpenAI's codex model with a bunch of few shot examples for natural language queries over blockchain data coupled with a finetuned completions model for plot the results when we were working on makerdojo.io.

    Without the ability to fine tune Codex, we had to employ a bunch of techniques to pick the right set of examples to provide in the context to get good SQL. We have come very far since then and it has been great to see projects projects like sqlcoder and this.

    Pieces of the pipeline powering makerdojo eventually became LLMStack (https://github.com/trypromptly/LLMStack). We are looking to integrate one of these SQL generators as processors in LLMStack so we can build higher level applications.

  • sqlcoder

    SoTA LLM for converting natural language questions to SQL queries

  • There are pretty large NL -> SQL datasets (e.g. this ~80K sample dataset: https://huggingface.co/datasets/b-mc2/sql-create-context). An OSS model based on StarCoder was also recently published which is roughly between GPT-3.5 and GPT-4: https://github.com/defog-ai/sqlcoder

  • ada

    Accelerate your data analysis with AI. (by BenderV)

  • I switched to chat version, named [Ada](https://github.com/BenderV/ada), and IMHO, it's better. The AI explore the database, it's connection, the data format & co. Plus, it help with ambiguity and feels more "natural".

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • We built a self-hosted low-code platform to build LLM apps locally and open-sourced it

    1 project | /r/OpenAI | 3 Sep 2023
  • LLMStack: self-hosted low-code platform to build LLM apps locally with LocalAI support

    1 project | /r/selfhosted | 3 Sep 2023
  • LLMStack: a self-hosted low-code platform to build LLM apps locally

    1 project | /r/programming | 1 Sep 2023
  • Teaching with AI

    2 projects | news.ycombinator.com | 31 Aug 2023
  • Show HN: LLMStack – Self-Hosted, Low-Code Platform to Build AI Experiences

    1 project | news.ycombinator.com | 31 Aug 2023