DuckDB – in-process SQL OLAP database management system

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • duckdf

    🦆 SQL for R dataframes, with ducks

  • Quite a while ago, when duckdb was just a duckling, I wrote an R package that supported direct manipulation of R dataframes using SQL.[1] duckdb was the engine for this.

    The approach was never as fast as data.table but did approach the speed of dplyr for more complex queries.

    Life had other things in store for me and I haven’t touched this library for a while now.

    At the time there was no Julia connector for duckdb, but now that there is, I’d like to try this approach in that language.

    [1] https://github.com/phillc73/duckdf

  • postgres_scanner

  • Doesn't postgres have a columnar option? If so, you could prob get better performance for your analytical interactions if you switched some tables to columnar.

    Otherwise check out postgres scanner. https://github.com/duckdblabs/postgres_scanner

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ibis

    the portable Python dataframe library

  • You might be interested in checking out Ibis (https://ibis-project.org/). It provides a dataframe-like API, abstracting over many common execution engines (duckdb, postgres, bigquery, spark, ...). Ibis wrapping duckdb has pretty much replaced pandas as my tool of choice for local data analysis. All the performance of duckdb with all the ergonomics of a dataframe API. (disclaimer: I contribute to Ibis for work).

    If you're curious, I've written a FOSS record linkage library that executes everything as SQL. It supports multiple SQL backends including DuckDB and Spark for scale, and runs faster than most competitors because it's able to leverage the speed of these backends: https://github.com/moj-analytical-services/splink

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts