csvq
duckdb
Our great sponsors
csvq | duckdb | |
---|---|---|
14 | 52 | |
1,446 | 16,576 | |
- | 10.7% | |
2.7 | 10.0 | |
4 months ago | 3 days ago | |
Go | C++ | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
csvq
-
Fx – Terminal JSON Viewer
sure can do, if you already use that shell [1], but personally I like specific tools for specific jobs such as jq [2], fx, csvq [3] etc, there's value in decoupling shells from utils (modularity, speed, innovation etc).
[1] I don't but tempted to try, like its data-types concept
[2] https://jqlang.github.io/jq/
[3] https://github.com/mithrandie/csvq
-
Tool to interact with CSV
csvq
-
Can SQL be used without an RDBMS?
There is a way of running SQL-like queries against CSV files.
-
Yq is a portable yq: command-line YAML, JSON, XML, CSV and properties processor
Lately I have had to do a lot of flat file analysis and tools along these lines have been a godsend. Will check this out.
My go to lately has been csvq (https://mithrandie.github.io/csvq/). Really nice to be able run complicated selects right over a CSV file with no setup at all.
-
Wie fusioniert man CSV tables?
csvq (https://mithrandie.github.io/csvq/)
-
Tool to explore big data sets
I usually do this with awk, my largest target files being half a TB in size for a project last year (and far too large to hold entirely in RAM). There are some other utilities like csvq and csvsql both of which let you write SQL-style queries against CSV files, but I'm not sure how they perform on large files. There's a nice list of CSV manipulation tools too if any of those jog your memory.
-
sqly - execute SQL against CSV / JSON with shell
Apparently, there were many who thought the same thing; Tools to execute SQL against CSV were trdsql, q, csvq, TextQL. They were highly functional, hoewver, had many options and no input completion. I found it just a little difficult to use.
- One-liner for running queries against CSV files with SQLite
-
Most efficient way to query .CSV files for Mac?
Please check out this tool https://github.com/mithrandie/csvq
-
Looking for: library to turn SQL (or abstracted) to code & execute against custom backend (slice of structs)
If you are looking to query nondb data with sql statements then you may want to check something like https://github.com/mithrandie/csvq (SQL for csv).
duckdb
- 🪄 DuckDB sql hack : get things SORTED w/ constraint CHECK
- DuckDB: Move to push-based execution model (2021)
-
DuckDB performance improvements with the latest release
I'm not sure if the fix is reassuring or not: https://github.com/duckdb/duckdb/pull/9411/files
-
Building a Distributed Data Warehouse Without Data Lakes
It's an interesting question!
The problem is that the data is spread everywhere - no choice about that. So with that in mind, how do you query that data? Today, the idea is that you HAVE to put it into a central location. With tools like Bacalhau[1] and DuckDB [2], you no longer have to - a single query can be sharded amongst all your data - EFFECTIVELY giving you a lot of what you want from a data lake.
It's not a replacement, but if you can do a few of these items WITHOUT moving the data, you will be able to see really significant cost and time savings.
[1] https://github.com/bacalhau-project/bacalhau
[2] https://github.com/duckdb/duckdb
- DuckDB 0.9.0
-
Push or Pull, is this a question?
[4] Switch to Push-Based Execution Model by Mytherin · Pull Request #2393 · duckdb/duckdb (github.com)
-
Show HN: Hydra 1.0 – open-source column-oriented Postgres
it depends on your query obviously.
In general, I did very deep benchmarking of pg, clickhouse and duckdb, and I sure didn't make stupid mistakes like this: https://news.ycombinator.com/item?id=36990831
My dataset has 50B rows and 2tb of data, and I think columnar dbs are very overhiped and I chose pg because:
- pg performance is acceptable, maybe 2-3x times slower than clickhouse and duckdb on some queries if pg is configured correctly and run on compressed storage
- clickhouse and duckdb start falling apart very fast because they specialized on very narrow type of queries: https://github.com/ClickHouse/ClickHouse/issues/47520 https://github.com/ClickHouse/ClickHouse/issues/47521 https://github.com/duckdb/duckdb/discussions/6696
-
🦆 Effortless Data Quality w/duckdb on GitHub ♾️
This action installs duckdb with the version provided in input.
-
Using SQL inside Python pipelines with Duckdb, Glaredb (and others?)
Duckdb: https://github.com/duckdb/duckdb - seems pretty popular, been keeping an eye on this for close to a year now.
-
CSV or Parquet File Format
The Parquet-Go library is very complex, not yet success to use it. So I ask whether DuckDB can provide API https://github.com/duckdb/duckdb/issues/7776
What are some alternatives?
querycsv - QueryCSV enables you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to a CSV file
ClickHouse - ClickHouse® is a free analytics DBMS for big data
q - q - Run SQL directly on delimited files and multi-file sqlite databases
sqlite-worker - A simple, and persistent, SQLite database for Web and Workers.
yq - yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor
datasette - An open source multi-tool for exploring and publishing data
yq - Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents
octosql - OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
miller - Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
metabase-clickhouse-driver - ClickHouse database driver for the Metabase business intelligence front-end
gsheet - gsheet is a CLI tool (and Golang package) for piping csv data to and from Google Sheets
arrow-datafusion - Apache DataFusion SQL Query Engine