reddit_mining
json-toolkit
reddit_mining | json-toolkit | |
---|---|---|
4 | 5 | |
11 | 67 | |
- | - | |
2.6 | 4.6 | |
10 months ago | about 1 year ago | |
HTML | Python | |
Creative Commons Zero v1.0 Universal | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
reddit_mining
-
Analyzing multi-gigabyte JSON files locally
zstd decompression should almost always be very fast. It's faster to decompress than DEFLATE or LZ4 in all the benchmarks that I've seen.
you might be interested in converting the pushshift data to parquet. Using octosql I'm able to query the submissions data (from the begining of reddit to Sept 2022) in about 10 min
https://github.com/chapmanjacobd/reddit_mining#how-was-this-...
Although if you're sending the data to postgres or BigQuery you can probably get better query performance via indexes or parallelism.
- reddit_mining - List of all Subreddits
- Show HN: List of All Subreddits
- Top 50k Subreddits
json-toolkit
-
Show HN: Comma Separated Values (CSV) to Unicode Separated Values (USV)
CSV is great because excel can import it, but it can't import USV, so at that point, why use USV when you can use JSON?
https://github.com/tyleradams/json-toolkit/
-
Analyzing multi-gigabyte JSON files locally
> Also note that this approach generalizes to other text-based formats. If you have 10 gigabyte of CSV, you can use Miller for processing. For binary formats, you could use fq if you can find a workable record separator.
You can also generalize it without learning a new minilanguage by using https://github.com/tyleradams/json-toolkit which converts csv/binary/whatever to/from json
- Fq: Jq for Binary Formats
-
Show HN: Angle Grinder – A terminal app to slice, dice, and aggregate your logs
I really like this tool, but I'm not sure what it gets me more than jq (and https://github.com/tyleradams/json-toolkit to convert non-json to json).
What can angle grinder do better than jq?
- Show HN: Transform a CSV into a JSON and vice versa
What are some alternatives?
json-streamer - A fast streaming JSON parser for Python that generates SAX-like events using yajl
miller - Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
json-buffet
ndjson - Streaming line delimited json parser + serializer
semi_index - Implementation of the JSON semi-index described in the paper "Semi-Indexing Semi-Structured Data in Tiny Space"
angle-grinder - Slice and dice logs on the command line
jq-zsh-plugin - jq zsh plugin
csv2json - Simple tool for converting CSVs to JSON
octosql - OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
jq - Command-line JSON processor [Moved to: https://github.com/jqlang/jq]
zsv - zsv+lib: tabular data swiss-army knife CLI + world's fastest (simd) CSV parser
nq - Unix command line queue utility