bdfr-html
Converts the output of the bulk downloader for reddit to a set of HTML pages. (by BlipRanger)
Pushshift-Importer | bdfr-html | |
---|---|---|
7 | 7 | |
14 | 60 | |
- | - | |
2.0 | 0.0 | |
about 1 year ago | over 2 years ago | |
Rust | Python | |
Apache License 2.0 | GNU General Public License v3.0 only |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Pushshift-Importer
Posts with mentions or reviews of Pushshift-Importer.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-06-07.
-
What are you using to browse/self host downloaded reddit?
I'm thinking i will have to get a project like redarc or BDFR-to-HTML or much more likely Pushshift-Importer which allows you to import pushshift downloads into a SQLite database. From there i would have to hook up the database to a reddit-like frontend.
-
[META] Hey mods, how about an AutoMod config to remove posts asking, "Am I too old?"
Just download the dumps from pushshift and then use Pushshift-Importer.
-
Rust template for parsing ZST files
I wrote my own rust based importer. Feel free to use types and such from that as well.
-
How do I correctly stream data from the dump files when they are in the weird json format and convert them to a csv.
I built a command line tool to import the dumps into sqlite if you want to give it a go. https://github.com/Paul-E/Pushshift-Importer
-
Data dumps
I wrote some code to do just this. Input the locations of the comments and submissions and it will produce an output sqlite file.
-
What are you using to analyze the pushift dumps ?
I created a pushshift importer for comments. You can find it here. It will import the comments into a sqlite database. It is written in rust and is very fast compared to python. It can import everything overnight if you have an SSD.
-
Performance of a 2TB comments database
If you stick with SQLite, you could try creating your own sequencer. Funnel all your writes into one thread on one process, and have that thread do the writing. That way there is only ever one possible writer on the DB at a time. Here is an example what I did when I built a tool to import comments from pushshift into SQLite. When I do this on an NVME drive and I am CPU bound on decompression and JSON parsing, so the DB isn't even a bottleneck.
bdfr-html
Posts with mentions or reviews of bdfr-html.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-06-27.
-
I've seen dozens of posts on how to mass download reddit, what are you actually doing with it? How are you displaying or searching it?
I use this with bdfr. https://github.com/BlipRanger/bdfr-html
-
What are you using to browse/self host downloaded reddit?
I'm thinking i will have to get a project like redarc or BDFR-to-HTML or much more likely Pushshift-Importer which allows you to import pushshift downloads into a SQLite database. From there i would have to hook up the database to a reddit-like frontend.
-
How to Automate the saving of the contents of bookmarked Reddit threads?
You may be able to achieve that with bdfr and bdfr-html.
-
Does anybody know a good way to quickly save the Wikis and FAQs from specific subreddits?
Now I don't know if you can find better, but you can use this alongside with this for most of reddit hoarding, see if it helps.
- Looking For An App That Will Download Whole Webpages Offline (Specifically Reddit Threads)
-
What is a tool to download all my saved posts?
I have not used it myself but this might let you view the bdfr output as a website: https://github.com/BlipRanger/bdfr-html
-
Bulk Downloader for Reddit, tool for archiving reddit, has a major release!
Check out my (really beta) project to make viewing a bit easier - https://github.com/BlipRanger/bdfr-html
What are some alternatives?
When comparing Pushshift-Importer and bdfr-html you can also consider the following projects:
PushshiftDumps - Example scripts for the pushshift dump files
bulk-downloader-for-reddit - Downloads and archives content from reddit
redarc - Reddit archiver
expanse - selfhosted multi-user web app for externally storing Reddit items (saved, created, upvoted, downvoted, hidden) to bypass Reddit's 1000-item listing limits
cloud-to-butt - Chrome extension that replaces occurrences of 'the cloud' with 'my butt'
Reddit-Post-Notifier - Get notified of new Reddit posts matching your search criteria