dstkdata vs xsv

dstkdata

The (large) data files needed for the Data Science Toolkit project (by petewarden)

Suggest topics

Source Code

datasciencetoolkit.org

Suggest alternative

Edit details

xsv

A fast CSV command line toolkit written in Rust. (by BurntSushi)

Applications written in Rust CSV CLI Command-line Rust

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

dstkdata		xsv
	Project
1	Mentions	64
219	Stars	10,115
-	Growth	-
10.0	Activity	0.0
almost 11 years ago	Latest Commit	3 months ago
	Language	Rust
-	License	The Unlicense

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

dstkdata

Posts with mentions or reviews of dstkdata. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-05-11.

Show HN: Adding proper CSV support to AWK
4 projects | news.ycombinator.com | 11 May 2022

FYI I ran on the worldcities data at https://github.com/petewarden/dstkdata (credit to xsv for choosing that dataset) against https://github.com/BurntSushi/xsv and https://github.com/liquidaty/zsv (full disclosure: I am one of the zsv authors). Here's what I got.
fastest to slowest: zsv (0.07), xsv (0.16), goawk (0.42), python (~1.6)
Obviously, does not tell the whole story as this test was limited to "count" and an interpreted language is expected to always be slower compared to a precompiled command, but, it might be relevant to a user deciding what tool to use. Also, might be instructive as to room for improvement in the go code (or possibly the go code could use the c lib)-- I note that even if the goawk command is '{}' the runtime is still about the same.
full results:
goawk:

xsv

Posts with mentions or reviews of xsv. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-02.

Show HN: TextQuery – Query and Visualize Your CSV Data in Minutes
3 projects | news.ycombinator.com | 2 Apr 2024

I realize it's not really that comparable since these tools don't support SQL, but a more fully functioned CLI tool is - https://github.com/BurntSushi/xsv
They are both fairly good
Qsv: Efficient CSV CLI Toolkit
8 projects | news.ycombinator.com | 22 Dec 2023
Joining CSV Data Without SQL: An IP Geolocation Use Case
3 projects | news.ycombinator.com | 19 Oct 2023

I have done some similar, simpler data wrangling with xsv (https://github.com/BurntSushi/xsv) and jq. It could process my 800M rows in a couple of minutes (plus the time to read it out from the database =)
Qsv: CSVs sliced, diced and analyzed (fork of xsv)
2 projects | news.ycombinator.com | 27 Jun 2023

xsv, which seems to be why qsv was created.
[1] https://github.com/BurntSushi/xsv/issues/267
I wrote this iCalendar (.ics) command-line utility to turn common calendar exports into more broadly compatible CSV files.
6 projects | /r/commandline | 24 Mar 2023

CSV utilities (still haven't pick a favorite one...): https://github.com/harelba/q https://github.com/BurntSushi/xsv https://github.com/wireservice/csvkit https://github.com/johnkerl/miller
Icsp – Command-line iCalendar (.ics) to CSV parser
3 projects | news.ycombinator.com | 24 Mar 2023
ripgrep is faster than {grep, ag, git grep, ucg, pt, sift}
8 projects | /r/programming | 24 Mar 2023

$ git remote -v origin [email protected]:rust-lang/rust (fetch) origin [email protected]:rust-lang/rust (push) $ git rev-parse HEAD 3b0d4813ab461ec81eab8980bb884691c97c5a35 $ time grep -ri burntsushi ./ ./src/tools/cargotest/main.rs: repo: "https://github.com/BurntSushi/ripgrep", ./src/tools/cargotest/main.rs: repo: "https://github.com/BurntSushi/xsv", grep: ./target/debug/incremental/cargotest-2dvu4f2km9e91/s-gactj3ma2j-1b10l4z-2l60ur55ixe6n/query-cache.bin: binary file matches grep: ./target/debug/incremental/cargotest-38cpmhhbdgdyq/s-gactj3luwq-1o12vgp-t61hd8qdyp7t/query-cache.bin: binary file matches grep: ./target/debug/incremental/cargotest-17632op6djxne/s-gawuq5468i-1h69nfw-4gm0s8yhhiun/query-cache.bin: binary file matches grep: ./target/debug/incremental/cargotest-2trm4kt5yom3r/s-gawuq53qqg-bjiezj-lo0gha8ign8w/query-cache.bin: binary file matches grep: ./target/debug/deps/libregex_automata-c74a6d9fd0abd77b.rmeta: binary file matches grep: ./target/debug/deps/libsame_file-a0e0363a2985455d.rlib: binary file matches grep: ./target/debug/deps/libsame_file-a0e0363a2985455d.rmeta: binary file matches grep: ./target/debug/deps/libsame_file-7251d8d3586a319b.rmeta: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-sysroot/lib/rustlib/x86_64-unknown-linux-gnu/lib/libaho_corasick-999a08e2b700420d.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-sysroot/lib/rustlib/x86_64-unknown-linux-gnu/lib/libregex_automata-0d168be5d25b3ac5.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu/release/deps/libregex_automata-7d6bec0156f15da1.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu/release/deps/libregex_automata-7d6bec0156f15da1.rmeta: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu/release/deps/libaho_corasick-07dee4514b87d99b.rmeta: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu/release/deps/libaho_corasick-07dee4514b87d99b.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/libaho_corasick-999a08e2b700420d.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/libaho_corasick-999a08e2b700420d.rmeta: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/libregex_automata-0d168be5d25b3ac5.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/libregex_automata-0d168be5d25b3ac5.rmeta: binary file matches grep: ./build/bootstrap/debug/deps/libaho_corasick-992e1ba08ef83436.rmeta: binary file matches grep: ./build/bootstrap/debug/deps/libignore-54d41239d2761852.rmeta: binary file matches grep: ./build/bootstrap/debug/deps/libsame_file-9a5e3ddd89cfe599.rlib: binary file matches grep: ./build/bootstrap/debug/deps/libregex_automata-8e700951c9869a66.rlib: binary file matches grep: ./build/bootstrap/debug/deps/libignore-54d41239d2761852.rlib: binary file matches grep: ./build/bootstrap/debug/deps/libaho_corasick-992e1ba08ef83436.rlib: binary file matches grep: ./build/bootstrap/debug/deps/libregex_automata-8e700951c9869a66.rmeta: binary file matches grep: ./build/bootstrap/debug/deps/libsame_file-9a5e3ddd89cfe599.rmeta: binary file matches real 16.683 user 15.793 sys 0.878 maxmem 8 MB faults 0
Any Linux admins willing to try Pygrep?
6 projects | /r/linuxadmin | 18 Mar 2023

Unrelated, are you the same burntsushi that wrote xsv?
Analyzing multi-gigabyte JSON files locally
14 projects | news.ycombinator.com | 18 Mar 2023

If it could be tabular in nature, maybe convert to sqlite3 so you can make use of indexing, or CSV to make use of high-performance tools like xsv or zsv (the latter of which I'm an author).
https://github.com/BurntSushi/xsv
https://github.com/liquidaty/zsv/blob/main/docs/csv_json_sql...
What monitoring tool do you use or recommend?
5 projects | /r/selfhosted | 6 Mar 2023

Oh and there's rad cli shit out there for CSV files too, like xsv

What are some alternatives?

When comparing dstkdata and xsv you can also consider the following projects:

zsv - zsv+lib: tabular data swiss-army knife CLI + world's fastest (simd) CSV parser

csvtk - A cross-platform, efficient and practical CSV/TSV toolkit in Golang

goawk - A POSIX-compliant AWK interpreter written in Go, with CSV support

miller - Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

ripgrep - ripgrep recursively searches directories for a regex pattern while respecting your gitignore

Servo - Servo, the embeddable, independent, memory-safe, modular, parallel web rendering engine

Fractalide - Reusable Reproducible Composable Software

svgcleaner - svgcleaner could help you to clean up your SVG files from the unnecessary data.

q - q - Run SQL directly on delimited files and multi-file sqlite databases

iota - A terminal-based text editor written in Rust

tidy-viewer - 📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.

habitat - Modern applications with built-in automation

dstkdata vs zsv xsv vs csvtk dstkdata vs goawk xsv vs miller xsv vs ripgrep xsv vs Servo xsv vs Fractalide xsv vs svgcleaner xsv vs q xsv vs iota xsv vs tidy-viewer xsv vs habitat

Compare dstkdata vs xsv and see what are their differences.

dstkdata

xsv

dstkdata

xsv

What are some alternatives?