csvq vs duckdb

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

csvq		duckdb
	Project
14	Mentions	52
1,446	Stars	16,576
-	Growth	10.7%
2.7	Activity	10.0
4 months ago	Latest Commit	3 days ago
Go	Language	C++
MIT License	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

csvq

Posts with mentions or reviews of csvq. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-19.

Fx – Terminal JSON Viewer
12 projects | news.ycombinator.com | 19 Sep 2023

sure can do, if you already use that shell [1], but personally I like specific tools for specific jobs such as jq [2], fx, csvq [3] etc, there's value in decoupling shells from utils (modularity, speed, innovation etc).
[1] I don't but tempted to try, like its data-types concept
[2] https://jqlang.github.io/jq/
[3] https://github.com/mithrandie/csvq
Tool to interact with CSV
9 projects | /r/commandline | 27 Feb 2023

csvq
Can SQL be used without an RDBMS?
7 projects | /r/PHP | 27 Feb 2023

There is a way of running SQL-like queries against CSV files.
Yq is a portable yq: command-line YAML, JSON, XML, CSV and properties processor
11 projects | news.ycombinator.com | 4 Feb 2023

Lately I have had to do a lot of flat file analysis and tools along these lines have been a godsend. Will check this out.
My go to lately has been csvq (https://mithrandie.github.io/csvq/). Really nice to be able run complicated selects right over a CSV file with no setup at all.
Wie fusioniert man CSV tables?
1 project | /r/de_EDV | 25 Jan 2023

csvq (https://mithrandie.github.io/csvq/)
Tool to explore big data sets
2 projects | /r/commandline | 2 Jan 2023

I usually do this with awk, my largest target files being half a TB in size for a project last year (and far too large to hold entirely in RAM). There are some other utilities like csvq and csvsql both of which let you write SQL-style queries against CSV files, but I'm not sure how they perform on large files. There's a nice list of CSV manipulation tools too if any of those jog your memory.
sqly - execute SQL against CSV / JSON with shell
5 projects | /r/SQL | 10 Nov 2022

Apparently, there were many who thought the same thing; Tools to execute SQL against CSV were trdsql, q, csvq, TextQL. They were highly functional, hoewver, had many options and no input completion. I found it just a little difficult to use.
One-liner for running queries against CSV files with SQLite
20 projects | news.ycombinator.com | 21 Jun 2022
Most efficient way to query .CSV files for Mac?
1 project | /r/SQL | 6 Jun 2022

Please check out this tool https://github.com/mithrandie/csvq
Looking for: library to turn SQL (or abstracted) to code & execute against custom backend (slice of structs)
5 projects | /r/golang | 18 May 2022

If you are looking to query nondb data with sql statements then you may want to check something like https://github.com/mithrandie/csvq (SQL for csv).

duckdb

Posts with mentions or reviews of duckdb. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-11-06.

🪄 DuckDB sql hack : get things SORTED w/ constraint CHECK
1 project | dev.to | 4 Apr 2024
DuckDB: Move to push-based execution model (2021)
1 project | news.ycombinator.com | 15 Mar 2024
DuckDB performance improvements with the latest release
8 projects | news.ycombinator.com | 6 Nov 2023

I'm not sure if the fix is reassuring or not: https://github.com/duckdb/duckdb/pull/9411/files
Building a Distributed Data Warehouse Without Data Lakes
3 projects | news.ycombinator.com | 2 Nov 2023

It's an interesting question!
The problem is that the data is spread everywhere - no choice about that. So with that in mind, how do you query that data? Today, the idea is that you HAVE to put it into a central location. With tools like Bacalhau[1] and DuckDB [2], you no longer have to - a single query can be sharded amongst all your data - EFFECTIVELY giving you a lot of what you want from a data lake.
It's not a replacement, but if you can do a few of these items WITHOUT moving the data, you will be able to see really significant cost and time savings.
[1] https://github.com/bacalhau-project/bacalhau
[2] https://github.com/duckdb/duckdb
DuckDB 0.9.0
3 projects | news.ycombinator.com | 26 Sep 2023
Push or Pull, is this a question?
2 projects | dev.to | 9 Aug 2023

[4] Switch to Push-Based Execution Model by Mytherin · Pull Request #2393 · duckdb/duckdb (github.com)
Show HN: Hydra 1.0 – open-source column-oriented Postgres
12 projects | news.ycombinator.com | 3 Aug 2023

it depends on your query obviously.
In general, I did very deep benchmarking of pg, clickhouse and duckdb, and I sure didn't make stupid mistakes like this: https://news.ycombinator.com/item?id=36990831
My dataset has 50B rows and 2tb of data, and I think columnar dbs are very overhiped and I chose pg because:
- pg performance is acceptable, maybe 2-3x times slower than clickhouse and duckdb on some queries if pg is configured correctly and run on compressed storage
- clickhouse and duckdb start falling apart very fast because they specialized on very narrow type of queries: https://github.com/ClickHouse/ClickHouse/issues/47520 https://github.com/ClickHouse/ClickHouse/issues/47521 https://github.com/duckdb/duckdb/discussions/6696
🦆 Effortless Data Quality w/duckdb on GitHub ♾️
3 projects | dev.to | 25 Jul 2023

This action installs duckdb with the version provided in input.
Using SQL inside Python pipelines with Duckdb, Glaredb (and others?)
6 projects | /r/dataengineering | 30 Jun 2023

Duckdb: https://github.com/duckdb/duckdb - seems pretty popular, been keeping an eye on this for close to a year now.
CSV or Parquet File Format
3 projects | /r/Python | 1 Jun 2023

The Parquet-Go library is very complex, not yet success to use it. So I ask whether DuckDB can provide API https://github.com/duckdb/duckdb/issues/7776

What are some alternatives?

When comparing csvq and duckdb you can also consider the following projects:

querycsv - QueryCSV enables you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to a CSV file

ClickHouse - ClickHouse® is a free analytics DBMS for big data

q - q - Run SQL directly on delimited files and multi-file sqlite databases

sqlite-worker - A simple, and persistent, SQLite database for Web and Workers.

yq - yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor

datasette - An open source multi-tool for exploring and publishing data

yq - Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents

octosql - OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

miller - Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

metabase-clickhouse-driver - ClickHouse database driver for the Metabase business intelligence front-end

gsheet - gsheet is a CLI tool (and Golang package) for piping csv data to and from Google Sheets

arrow-datafusion - Apache DataFusion SQL Query Engine

csvq vs querycsv duckdb vs ClickHouse csvq vs q duckdb vs sqlite-worker csvq vs yq duckdb vs datasette csvq vs yq duckdb vs octosql csvq vs miller duckdb vs metabase-clickhouse-driver csvq vs gsheet duckdb vs arrow-datafusion

Compare csvq vs duckdb and see what are their differences.

csvq

duckdb

csvq

duckdb

What are some alternatives?