reddit_mining vs xsv

reddit_mining

By chapmanjacobd

Suggest topics

Source Code

Suggest alternative

Edit details

xsv

A fast CSV command line toolkit written in Rust. (by BurntSushi)

Applications written in Rust CSV CLI Command-line Rust

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

reddit_mining		xsv
	Project
4	Mentions	64
11	Stars	10,089
-	Growth	-
2.6	Activity	0.0
10 months ago	Latest Commit	2 months ago
HTML	Language	Rust
Creative Commons Zero v1.0 Universal	License	The Unlicense

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

reddit_mining

Posts with mentions or reviews of reddit_mining. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-03-18.

Analyzing multi-gigabyte JSON files locally
14 projects | news.ycombinator.com | 18 Mar 2023

zstd decompression should almost always be very fast. It's faster to decompress than DEFLATE or LZ4 in all the benchmarks that I've seen.
you might be interested in converting the pushshift data to parquet. Using octosql I'm able to query the submissions data (from the begining of reddit to Sept 2022) in about 10 min
https://github.com/chapmanjacobd/reddit_mining#how-was-this-...
Although if you're sending the data to postgres or BigQuery you can probably get better query performance via indexes or parallelism.
reddit_mining - List of all Subreddits
1 project | /r/CKsTechNews | 18 Jan 2023
Show HN: List of All Subreddits
1 project | news.ycombinator.com | 18 Jan 2023
Top 50k Subreddits
1 project | news.ycombinator.com | 16 Jan 2023

xsv

Posts with mentions or reviews of xsv. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-02.

Show HN: TextQuery – Query and Visualize Your CSV Data in Minutes
3 projects | news.ycombinator.com | 2 Apr 2024

I realize it's not really that comparable since these tools don't support SQL, but a more fully functioned CLI tool is - https://github.com/BurntSushi/xsv
They are both fairly good
Qsv: Efficient CSV CLI Toolkit
8 projects | news.ycombinator.com | 22 Dec 2023
Joining CSV Data Without SQL: An IP Geolocation Use Case
3 projects | news.ycombinator.com | 19 Oct 2023

I have done some similar, simpler data wrangling with xsv (https://github.com/BurntSushi/xsv) and jq. It could process my 800M rows in a couple of minutes (plus the time to read it out from the database =)
Qsv: CSVs sliced, diced and analyzed (fork of xsv)
2 projects | news.ycombinator.com | 27 Jun 2023

xsv, which seems to be why qsv was created.
[1] https://github.com/BurntSushi/xsv/issues/267
I wrote this iCalendar (.ics) command-line utility to turn common calendar exports into more broadly compatible CSV files.
6 projects | /r/commandline | 24 Mar 2023

CSV utilities (still haven't pick a favorite one...): https://github.com/harelba/q https://github.com/BurntSushi/xsv https://github.com/wireservice/csvkit https://github.com/johnkerl/miller
Icsp – Command-line iCalendar (.ics) to CSV parser
3 projects | news.ycombinator.com | 24 Mar 2023
ripgrep is faster than {grep, ag, git grep, ucg, pt, sift}
8 projects | /r/programming | 24 Mar 2023

$ git remote -v origin [email protected]:rust-lang/rust (fetch) origin [email protected]:rust-lang/rust (push) $ git rev-parse HEAD 3b0d4813ab461ec81eab8980bb884691c97c5a35 $ time grep -ri burntsushi ./ ./src/tools/cargotest/main.rs: repo: "https://github.com/BurntSushi/ripgrep", ./src/tools/cargotest/main.rs: repo: "https://github.com/BurntSushi/xsv", grep: ./target/debug/incremental/cargotest-2dvu4f2km9e91/s-gactj3ma2j-1b10l4z-2l60ur55ixe6n/query-cache.bin: binary file matches grep: ./target/debug/incremental/cargotest-38cpmhhbdgdyq/s-gactj3luwq-1o12vgp-t61hd8qdyp7t/query-cache.bin: binary file matches grep: ./target/debug/incremental/cargotest-17632op6djxne/s-gawuq5468i-1h69nfw-4gm0s8yhhiun/query-cache.bin: binary file matches grep: ./target/debug/incremental/cargotest-2trm4kt5yom3r/s-gawuq53qqg-bjiezj-lo0gha8ign8w/query-cache.bin: binary file matches grep: ./target/debug/deps/libregex_automata-c74a6d9fd0abd77b.rmeta: binary file matches grep: ./target/debug/deps/libsame_file-a0e0363a2985455d.rlib: binary file matches grep: ./target/debug/deps/libsame_file-a0e0363a2985455d.rmeta: binary file matches grep: ./target/debug/deps/libsame_file-7251d8d3586a319b.rmeta: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-sysroot/lib/rustlib/x86_64-unknown-linux-gnu/lib/libaho_corasick-999a08e2b700420d.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-sysroot/lib/rustlib/x86_64-unknown-linux-gnu/lib/libregex_automata-0d168be5d25b3ac5.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu/release/deps/libregex_automata-7d6bec0156f15da1.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu/release/deps/libregex_automata-7d6bec0156f15da1.rmeta: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu/release/deps/libaho_corasick-07dee4514b87d99b.rmeta: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu/release/deps/libaho_corasick-07dee4514b87d99b.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/libaho_corasick-999a08e2b700420d.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/libaho_corasick-999a08e2b700420d.rmeta: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/libregex_automata-0d168be5d25b3ac5.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/libregex_automata-0d168be5d25b3ac5.rmeta: binary file matches grep: ./build/bootstrap/debug/deps/libaho_corasick-992e1ba08ef83436.rmeta: binary file matches grep: ./build/bootstrap/debug/deps/libignore-54d41239d2761852.rmeta: binary file matches grep: ./build/bootstrap/debug/deps/libsame_file-9a5e3ddd89cfe599.rlib: binary file matches grep: ./build/bootstrap/debug/deps/libregex_automata-8e700951c9869a66.rlib: binary file matches grep: ./build/bootstrap/debug/deps/libignore-54d41239d2761852.rlib: binary file matches grep: ./build/bootstrap/debug/deps/libaho_corasick-992e1ba08ef83436.rlib: binary file matches grep: ./build/bootstrap/debug/deps/libregex_automata-8e700951c9869a66.rmeta: binary file matches grep: ./build/bootstrap/debug/deps/libsame_file-9a5e3ddd89cfe599.rmeta: binary file matches real 16.683 user 15.793 sys 0.878 maxmem 8 MB faults 0
Any Linux admins willing to try Pygrep?
6 projects | /r/linuxadmin | 18 Mar 2023

Unrelated, are you the same burntsushi that wrote xsv?
Analyzing multi-gigabyte JSON files locally
14 projects | news.ycombinator.com | 18 Mar 2023

If it could be tabular in nature, maybe convert to sqlite3 so you can make use of indexing, or CSV to make use of high-performance tools like xsv or zsv (the latter of which I'm an author).
https://github.com/BurntSushi/xsv
https://github.com/liquidaty/zsv/blob/main/docs/csv_json_sql...
What monitoring tool do you use or recommend?
5 projects | /r/selfhosted | 6 Mar 2023

Oh and there's rad cli shit out there for CSV files too, like xsv

What are some alternatives?

When comparing reddit_mining and xsv you can also consider the following projects:

json-streamer - A fast streaming JSON parser for Python that generates SAX-like events using yajl

csvtk - A cross-platform, efficient and practical CSV/TSV toolkit in Golang

json-buffet

miller - Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

semi_index - Implementation of the JSON semi-index described in the paper "Semi-Indexing Semi-Structured Data in Tiny Space"

ripgrep - ripgrep recursively searches directories for a regex pattern while respecting your gitignore

jq-zsh-plugin - jq zsh plugin

Servo - Servo, the embeddable, independent, memory-safe, modular, parallel web rendering engine

octosql - OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

Fractalide - Reusable Reproducible Composable Software

zsv - zsv+lib: tabular data swiss-army knife CLI + world's fastest (simd) CSV parser

svgcleaner - svgcleaner could help you to clean up your SVG files from the unnecessary data.

reddit_mining vs json-streamer xsv vs csvtk reddit_mining vs json-buffet xsv vs miller reddit_mining vs semi_index xsv vs ripgrep reddit_mining vs jq-zsh-plugin xsv vs Servo reddit_mining vs octosql xsv vs Fractalide reddit_mining vs zsv xsv vs svgcleaner

Compare reddit_mining vs xsv and see what are their differences.

reddit_mining

xsv

reddit_mining

xsv

What are some alternatives?