An open source DuckDB text to SQL LLM

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • wasm-client

    Examples for the MotherDuck WASM Client library, enabling MotherDuck integration for WebAssembly-powered DuckDB

  • Love these! We do want to deliver more features like FixIt! [0]

    What's really exciting is what you can do with DuckDB, MotherDuck, and WASM. A powerful in-browser storage and execution engine tethered to a central serverless data warehouse using hybrid mode [1] opens the doors for unprecedented experiences. Imagine the possibilities if you have metadata, data, query logic, or even LLMs in the client 0ms away from the user and on user's own hardware.

    So we're doing this in our UI of course, but we also released a WASM SDK so that developers can take advantage of this new architecture in their own apps! [2]

    [0]https://motherduck.com/blog/introducing-fixit-ai-sql-error-f...

    [1]https://motherduck.com/docs/architecture-and-capabilities

    [2]https://github.com/motherduckdb/wasm-client

  • DuckDB-NSQL

    DuckDB NSQL Model

  • 1. First of all, thanks for outlining how you trained the model here in the repo: https://github.com/NumbersStationAI/DuckDB-NSQL?tab=readme-o...! I did not know about `sqlglot`, that's a pretty cool lib. Which part of the project was the most challenging or time-consuming: generating the training data, the actual training, or testing? How did you iterate, improve, and test the model?

    2. How would you suggest using this model effectively if we have custom data in our DBs? For example, we might have a column called `purpose` that's a custom defined enum (i.e. not a very well-known concept outside of our business). Currently, we've fed it in as context by defining all the possible values it can have. Do you have any other recs on how to tune our prompts so that this model is just as effective with our own custom data?

    3. Similar to above, do you know you can use the same model to work effectively on tens or even hundreds of tables? I've used multiple question-SQL example pairs as context, but I've found that I need 15-20 for it to be effective for even one table, let alone tens of tables.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • vanna

    πŸ€– Chat with your SQL database πŸ“Š. Accurate Text-to-SQL Generation via LLMs using RAG πŸ”„.

  • I’m trying to solve for this with my project and (at least based on what people say in Discord), it’s working really well for them:

    https://github.com/vanna-ai/vanna

  • spider

    scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts