PushshiftDumps vs RedditLemmyImporter

PushshiftDumps

Example scripts for the pushshift dump files (by Watchful1)

Suggest topics

Source Code

Suggest alternative

Edit details

RedditLemmyImporter

🔥 Anti-Reddit Aktion 🔥 (by mesmere)

JSON lemmy Reddit SQL

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

PushshiftDumps		RedditLemmyImporter
	Project
40	Mentions	16
240	Stars	70
-	Growth	-
8.1	Activity	1.9
8 days ago	Latest Commit	11 months ago
Python	Language	Kotlin
MIT License	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

PushshiftDumps

Posts with mentions or reviews of PushshiftDumps. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-15.

Pushshift Dumps Help: Only getting submissions, that are named comments
1 project | /r/pushshift | 26 Nov 2023

I am trying to get comments and submissions from specific subreddits. So far, I've run the u/watchful1 script combine_folder_mutipleprocess.py and have been able process a few files.
Create and Search In Your Own Reddit Database
1 project | /r/pushshift | 3 Jul 2023

FYI, you can use my filter_file.py script to directly extract out submissions with a certain title. There's a place you can put in a file with a list of keywords to filter on if you have a lot of them. Or it would be fairly easy to modify to use a regex. There are also steps listed to export the list of submission ids and then filter a comments file to only comments from those submissions. You can also export directly to CSV, though you would want to use zst files for any intermediate steps. Let me know if anything in there doesn't work.
Reddit starting to bring back deleted comments.
2 projects | /r/RedditAlternatives | 15 Jun 2023

This repo has good examples of scripts to use them, https://github.com/Watchful1/PushshiftDumps
Encountered a non-utf8 character
2 projects | /r/pushshift | 14 Jun 2023

def read_redditfile(file: str) -> dict: """ Iterate over the pushshift JSON lines, yielding them as Python dicts. Decompress iteratively if necessary. """ # older files in the dataset are uncompressed while newer ones use zstd compression and have .xz, .bz2, or .zst endings if not file.endswith('.bz2') and not file.endswith('.xz') and not file.endswith('.zst'): with open(file, 'r', encoding='utf-8') as infile: for line in infile: l = json.loads(line) yield(l) else: # code by Watchful1 written for the Pushshift offline dataset, found here: https://github.com/Watchful1/PushshiftDumps with open(file, 'rb') as fh: dctx = ZstdDecompressor(max_window_size=2147483648) with dctx.stream_reader(fh) as reader: previous_line = "" while True: chunk = reader.read(2**24) # 16mb chunks if not chunk: break string_data = chunk.decode('utf-8') lines = string_data.split("\n") for i, line in enumerate(lines[:-1]): if i == 0: line = previous_line + line comment = json.loads(line) yield comment previous_line = lines[-1]
What to do after decompressing the files from academic torrents?
2 projects | /r/pushshift | 11 Jun 2023

Just look a folder down in the github repo https://github.com/Watchful1/PushshiftDumps/tree/master/scripts the scripts are still there.
What are you using to browse/self host downloaded reddit?
4 projects | /r/DataHoarder | 7 Jun 2023

I am working with the ZST files downloaded from Pushshift and sorted into subreddits by the lovely u/watchful1 here. ZST is too compressed to browse on its own but using scripts like this one you can process them into readable NDJSON files. From there im not sure what to do. I would like to have a self hosted reddit-clone that i can import these dumps into and browse freely.
Tell HN: My Reddit account was banned after adding my subs to the protest
5 projects | news.ycombinator.com | 4 Jun 2023

The whole reddit (posts and comments separately) from 2005-06 until 2022-12 is on this [1] torrent link, it's very easy to download, extract and use the data [2]. I'm writing my thesis about the connection between the reddit post's type and the comment structure, and I've been working with this data, for a few months, it's amazing.
[1] https://academictorrents.com/details/7c0645c94321311bb05bd87...
[2] https://github.com/Watchful1/PushshiftDumps
Reddit, API calls, and AI - Who does your knowledge belong to?
1 project | /r/singularity | 2 Jun 2023

Sure! You can download the compressed data from this torrent, then you can use this project if you want to just decompress and process the data.
Script to find overlapping users between subreddits from dump files
2 projects | /r/pushshift | 25 May 2023

You can go through the process outlined in that thread to download the subreddit's you're interested in, then add them at the top of the new script, run it and it will output the list of overlapping users. It will actually likely be faster than the old script even counting download times for the dumps since the api was so slow. Though you are limited to the available 20k subreddits.
This Reddit Community Has Been Archived
5 projects | /r/DataHoarder | 3 May 2023

how I read the file? First I got tried to extrat the file ok I got it, but them I text file I can't read that., I saw a few people saing it was just a json file I tried with a json reader but it say the json data is invalid, them I tried this program but nothing happens no new file is created or something, here a print, maybe I'm doing something wrong but I don't know because the script don't have any instruction how to use it!

RedditLemmyImporter

Posts with mentions or reviews of RedditLemmyImporter. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-27.

We're public again but this subreddit will be soon mothballed
1 project | /r/libgen | 27 Jun 2023

There is https://github.com/rileynull/RedditLemmyImporter if someone wants to help us move to /c/libgen
META: we need a complete dump of https://old.reddit.com/r/CollapseScience/ including the Wiki contents preferably -- please discuss how we can achieve that as comments under that post. Thanks.
3 projects | /r/CollapseScience | 27 Jun 2023

For https://github.com/rileynull/RedditLemmyImporter it needs a Lemmy instance, and an API key for which https://support.reddithelp.com/hc/en-us/articles/16160319875092-Reddit-Data-API-Wiki
kbin & lemmy migration - tools/bots to move content from one platform to another?
2 projects | /r/ModCoord | 26 Jun 2023
Post-blackout and Going Forward
1 project | /r/linux | 20 Jun 2023

Actually there are a few such as RedditLemmyImporter by rileynulll.
Migrating subreddits to Lemmy communities
2 projects | /r/Lemmy | 15 Jun 2023

The issue is - how? I've seen that there have been tools created to back up subreddits to Lemmy, but these rely on having access to the instance database in order to effectively "seed" it; that's not possible for me as a moderator/user.
Migration Guide / Plan
3 projects | /r/RedditAlternatives | 9 Jun 2023

Currently, the RedditLemmyImporter tool is the best tool for migrating your subreddits away from Reddit. I'm not sure how many moderators are willing to go through the hassle of installing and running this script to migrate their communities. But I'm here to inform you that the tool does exist. If you're facing any technical difficulties with the tool, I think r/LemmyMigration, r/DataHoarder or the Github Issues page will provide the most technical support. I do apologise for the lack of documentation on the Github page though (its not my script).
How do we migrate a subreddit?
1 project | /r/RedditAlternatives | 9 Jun 2023

For literally migrating posts and comments over to Lemmy: https://github.com/rileynull/RedditLemmyImporter
ArchiveTeam has saved over 11.2B Reddit links so far. We need your help
3 projects | news.ycombinator.com | 9 Jun 2023

Haven't tried it, but this comment on /r/DataHoarder mentioned these two repos:
https://github.com/rileynull/RedditLemmyImporter
Support for Subreddit Migration?
1 project | /r/LemmyMigration | 8 Jun 2023

https://github.com/rileynull/RedditLemmyImporter might be what you are looking for?
ArchiveTeam has saved over 10.8 BILLION Reddit links so far. We need YOUR help running ArchiveTeam Warrior to archive subreddits before they're gone indefinitely after June 12th!
9 projects | /r/DataHoarder | 6 Jun 2023

What are some alternatives?

When comparing PushshiftDumps and RedditLemmyImporter you can also consider the following projects:

Sketchpad

lemmy-js-client - A javascript / typescript http and websocket client and type system for Lemmy.

Pushshift-Importer

spring-reddit-clone - Reddit clone built using Spring Boot, Spring Security with JWT Authentication, Spring Data JPA with MySQL, Spring MVC. The frontend is built using Angular - You can find the frontend source code here - https://github.com/SaiUpadhyayula/angular-reddit-clone

zreader - Read compressed NDJSON .zst files easily

BotIt - A bot that scrapes posts from an specific subreddit and posts it on a kbin magazine.

7-Zip-zstd - 7-Zip with support for Brotli, Fast-LZMA2, Lizard, LZ4, LZ5 and Zstandard

warrior-dockerfile - A Dockerfile for the ArchiveTeam Warrior

reddit-project-public

export-saved-reddit - Export saved Reddit posts into a HTML file for import into Google Chrome.

Lemmy - 🐀 A link aggregator and forum for the fediverse

PRAW - PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

PushshiftDumps vs Sketchpad RedditLemmyImporter vs lemmy-js-client PushshiftDumps vs Pushshift-Importer RedditLemmyImporter vs spring-reddit-clone PushshiftDumps vs zreader RedditLemmyImporter vs BotIt PushshiftDumps vs 7-Zip-zstd RedditLemmyImporter vs warrior-dockerfile PushshiftDumps vs reddit-project-public RedditLemmyImporter vs export-saved-reddit PushshiftDumps vs Lemmy RedditLemmyImporter vs PRAW

Compare PushshiftDumps vs RedditLemmyImporter and see what are their differences.

PushshiftDumps

RedditLemmyImporter

PushshiftDumps

RedditLemmyImporter

What are some alternatives?