Friendlier SQL with DuckDB

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • duckdb

    DuckDB is an in-process SQL OLAP Database Management System

  • Good questions! You are correct that it depends. We do have indexes to help with point queries, but they are not going to be quite as fast as SQLite because DuckDB stores data in a columnar format. (Soon they will be persistent - see comments above!) That columnar format is really great for scanning many items, but not optimal for grabbing all of a single row.

    With DuckDB, bulk inserts are your friend and are actually super fast.

    Definitely let us know what you find! Just open up a discussion on Github if you'd like to share what you find out: https://github.com/duckdb/duckdb/discussions

  • sqlite_scanner

    DuckDB extension to read and write to SQLite databases

  • Excellent question! I'll jump in - I am a part of the DuckDB team though, so if other users have thoughts it would be great to get other perspectives as well.

    First things first - we really like quite a lot about the SQLite approach. DuckDB is similarly easy to install and is built without dependencies, just like SQLite. It also runs in the same process as your application just like SQLite does. SQLite is excellent as a transactional database - lots of very specific inserts, updates, and deletes (called OLTP workloads). DuckDB can also read directly out of SQLite files as well, so you can mix and match them! (https://github.com/duckdblabs/sqlitescanner)

    DuckDB is much faster than SQLite when doing analytical queries (OLAP) like when calculating summaries or trends over time, or joining large tables together. It can use all of your CPU cores for sometimes ~100x speedup over SQLite.

    DuckDB also has some enhancements with respect to data transfer in and out of it. It can natively read Pandas, R, and Julia dataframes, and can read parquet files directly also (meaning without inserting first!).

    Does that help? Happy to add more details!

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • db-benchmark

    reproducible benchmark of database-like ops

  • Hi, good to hear that you guys care about testing. One thing apart from the Github issues that led me to believe it might not be super stable yet was the benchmark results on https://h2oai.github.io/db-benchmark/ which make it look like it couldn't handle the 50GB case due to a out of memory error. I see that the benchmark and the used versions are about a year old so maybe things changed a lot since then. Can you chime in regarding the current story of running bigger DBs like 1TB on a machine with just 32GB or so RAM? Especially regardung data mutations and DDL queries. Thanks!

  • go-duckdb

    go-duckdb provides a database/sql driver for the DuckDB database engine.

  • Here is a solved Github Issue related to CGO for the Go bindings! If you have another issue, please feel free to post it on their Github page!

    https://github.com/marcboeker/go-duckdb/issues/4

  • budibase

    Budibase is an open-source low code platform that helps you build internal tools in minutes 🚀

  • postgres_scanner

  • Interesting thought! I have not tried this yet so I only have a guess as an answer. Could you export the data as SQL statements and then run those statements on DuckDB? That may be easier to set up, but may take longer to run...

    DuckDB also has the ability to read Postgres data directly, and there is a Postgres FDW that can read from DuckDB!

    https://github.com/duckdblabs/postgresscanner

    https://github.com/alitrack/duckdb_fdw

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts