InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards. Learn more →
Reddit_mining Alternatives
Similar projects and alternatives to reddit_mining
Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Implementation of the JSON semi-index described in the paper "Semi-Indexing Semi-Structured Data in Tiny Space"
reddit_mining discussion
reddit_mining reviews and mentions
Analyzing multi-gigabyte JSON files locally
zstd decompression should almost always be very fast. It's faster to decompress than DEFLATE or LZ4 in all the benchmarks that I've seen.
you might be interested in converting the pushshift data to parquet. Using octosql I'm able to query the submissions data (from the begining of reddit to Sept 2022) in about 10 min
Although if you're sending the data to postgres or BigQuery you can probably get better query performance via indexes or parallelism.
- reddit_mining - List of all Subreddits
- Show HN: List of All Subreddits
- Top 50k Subreddits
A note from our sponsor - InfluxDB | 12 Sep 2024
chapmanjacobd/reddit_mining is an open source project licensed under Creative Commons Zero v1.0 Universal which is not an OSI approved license.
The primary programming language of reddit_mining is HTML.