Q – Run SQL Directly on CSV or TSV Files

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

q

46 10,109 3.6 Python

q - Run SQL directly on delimited files and multi-file sqlite databases (by harelba)

Hi, author of q here.
Regarding the error you got, q currently does not autodetect headers, so you'd need to add -H as a flag in order to use the "country" column name. You're absolutely correct on failing-fast here - It's a bug which i'll fix.
In general regarding speed - q supports automatic caching of the CSV files (through the "-C readwrite" flag). Once it's activated, it will write the data into another file (with a .qsql extension), and will use it automatically in further queries in order to speed things considerably.
Effectively, the .qsql files are regular sqlite3 files (with some metadata), and q can be used to query them directly (or any regular sqlite3 file), including the ability to seamlessly join between multiple sqlite3 files.
http://harelba.github.io/q/#auto-caching-examples

dremio-oss

8 1,298 4.0 Java

Dremio - the missing link in modern data

I have been using Dremio to query large volume of CSV files: https://docs.dremio.com/software/data-sources/files-and-dire...
Although having them in some columnar format is much better for fast responses.
GitHub: https://github.com/dremio/dremio-oss

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
sqawk

1 308 10.0 Tcl

Like awk but with SQL and table joins

"Sqawk is an Awk-like program that uses SQL and can combine data from multiple files. It is powered by SQLite."
https://github.com/dbohdan/sqawk

textql

15 9,031 3.7 Go

Execute SQL against structured text like CSV or TSV

Reminds me of the textQL extension that's available in Asciidoc.
Point it to an external CSV file, enable TextQL, and bam, there's your query returned as a table. Handy for parts lists, inventory, that kind of crap.
https://github.com/dinedal/textql
https://gist.github.com/mojavelinux/8856117

lnav

75 6,661 9.5 C++

Log file navigator

When I want to quickly query a csv file (usually log files), I like to use lnav which also supports running SQL queries and supports pretty much any log file format I happened to deal with.
https://lnav.org/

steampipe

146 6,379 9.7 Go

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.

Another similar tool for querying CSV with SQL is Steampipe (https://steampipe.io). Steampipe is much broader in concept giving you the ability to query and join many cloud services with SQL as well: https://hub.steampipe.io/plugins
There is also an interesting dashboards as code concept where you can codify interactive dashboards with HCL + SQL: https://steampipe.io/blog/dashboards-as-code

zsv

25 170 7.4 C

zsv+lib: world's fastest (simd) CSV parser, bare metal or wasm, with an extensible CLI for SQL querying, format conversion and more

Nice work. I am a fan of tools like this and look forward to giving this a try.
However, in my first attempted query (version 3.1.6 on MacOS), I ran into significant performance limitations and more importantly, it did not give correct output.
In particular, running on a narrow table with 1mm rows (the same one used in the xsv examples) using the command "select country, count() from worldcitiespop_mil.csv group by country" takes 12 seconds just to get an incorrect error 'no such column: country'.
using sqlite3, it takes two seconds or so to load, and less than a second to run, and gives me the correct result.
Using https://github.com/liquidaty/zsv (disclaimer, I'm one of its authors), I get the correct results in 0.95 seconds with the one-liner `zsv sql 'select country, count() from data group by country' worldcitiespop_mil.csv`.
I look forward to trying it again sometime soon

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
duckdb

52 16,576 10.0 C++

DuckDB is an in-process SQL OLAP Database Management System

Q runs SQLite under the hood. If you want real firepower, take a look at DuckDB, the codebase is also a work of art.
https://github.com/duckdb/duckdb

ClickHouse

208 34,054 10.0 C++

ClickHouse® is a free analytics DBMS for big data

The large binary size argument is a bit old.
It seems possible to reduce it below 50MB.
https://github.com/ClickHouse/ClickHouse/issues/29378

arrow-flight-client-examples

1 64 4.1 Java

I will give it a shot!
The web application is just a UI to get started. It acts like a database providing you with jdbc and odbc drivers and arrow flight protocol: https://github.com/dremio-hub/arrow-flight-client-examples

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project