Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Quite a while ago, when duckdb was just a duckling, I wrote an R package that supported direct manipulation of R dataframes using SQL.[1] duckdb was the engine for this.
The approach was never as fast as data.table but did approach the speed of dplyr for more complex queries.
Life had other things in store for me and I haven’t touched this library for a while now.
At the time there was no Julia connector for duckdb, but now that there is, I’d like to try this approach in that language.
[1] https://github.com/phillc73/duckdf
Doesn't postgres have a columnar option? If so, you could prob get better performance for your analytical interactions if you switched some tables to columnar.
Otherwise check out postgres scanner. https://github.com/duckdblabs/postgres_scanner
You might be interested in checking out Ibis (https://ibis-project.org/). It provides a dataframe-like API, abstracting over many common execution engines (duckdb, postgres, bigquery, spark, ...). Ibis wrapping duckdb has pretty much replaced pandas as my tool of choice for local data analysis. All the performance of duckdb with all the ergonomics of a dataframe API. (disclaimer: I contribute to Ibis for work).
If you're curious, I've written a FOSS record linkage library that executes everything as SQL. It supports multiple SQL backends including DuckDB and Spark for scale, and runs faster than most competitors because it's able to leverage the speed of these backends: https://github.com/moj-analytical-services/splink
Related posts
- SQLGlot: SQL parser, transpiler, optimizer – translate to Presto, Spark, Hive
- Show HN: Open-source, browser-local data exploration using DuckDB-WASM and PRQL
- Splink: Fast, accurate, scalable probabilistic data linkage
- quack-reduce: duckdb as a stateless query engine over a data lake
- Five Apache projects you probably didn't know about