Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Pushshift-Importer Alternatives
Similar projects and alternatives to Pushshift-Importer
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a better Pushshift-Importer alternative or higher similarity.
Pushshift-Importer reviews and mentions
Posts with mentions or reviews of Pushshift-Importer.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-06-07.
-
What are you using to browse/self host downloaded reddit?
I'm thinking i will have to get a project like redarc or BDFR-to-HTML or much more likely Pushshift-Importer which allows you to import pushshift downloads into a SQLite database. From there i would have to hook up the database to a reddit-like frontend.
-
[META] Hey mods, how about an AutoMod config to remove posts asking, "Am I too old?"
Just download the dumps from pushshift and then use Pushshift-Importer.
-
Rust template for parsing ZST files
I wrote my own rust based importer. Feel free to use types and such from that as well.
-
How do I correctly stream data from the dump files when they are in the weird json format and convert them to a csv.
I built a command line tool to import the dumps into sqlite if you want to give it a go. https://github.com/Paul-E/Pushshift-Importer
-
Data dumps
I wrote some code to do just this. Input the locations of the comments and submissions and it will produce an output sqlite file.
-
What are you using to analyze the pushift dumps ?
I created a pushshift importer for comments. You can find it here. It will import the comments into a sqlite database. It is written in rust and is very fast compared to python. It can import everything overnight if you have an SSD.
-
Performance of a 2TB comments database
If you stick with SQLite, you could try creating your own sequencer. Funnel all your writes into one thread on one process, and have that thread do the writing. That way there is only ever one possible writer on the DB at a time. Here is an example what I did when I built a tool to import comments from pushshift into SQLite. When I do this on an NVME drive and I am CPU bound on decompression and JSON parsing, so the DB isn't even a bottleneck.
-
A note from our sponsor - InfluxDB
www.influxdata.com | 28 Apr 2024
Stats
Basic Pushshift-Importer repo stats
7
14
2.0
about 1 year ago
Paul-E/Pushshift-Importer is an open source project licensed under Apache License 2.0 which is an OSI approved license.
The primary programming language of Pushshift-Importer is Rust.
Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com