RedditLemmyImporter VS reddit-archivar

Compare RedditLemmyImporter vs reddit-archivar and see what are their differences.

reddit-archivar

:book: Archiving Cyber Security related Subreddits (by cookiengineer)
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
RedditLemmyImporter reddit-archivar
16 3
70 8
- -
1.9 5.1
11 months ago 11 months ago
Kotlin Go
Apache License 2.0 -
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

RedditLemmyImporter

Posts with mentions or reviews of RedditLemmyImporter. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-27.

reddit-archivar

Posts with mentions or reviews of reddit-archivar. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-06.
  • Ask HN: Is anyone working on a Reddit archive?
    1 project | news.ycombinator.com | 30 Jun 2023
    I was focussing mostly on cyber security related subreddits because the vulnerability and exploit discussions were of great value to me.

    I built a little scraper in golang that stores the JSON data (instead of the HTML which the archive warrior stores) to save hdd storage.

    I managed to discover and scrape around 80GB of reduced JSON data, but have no idea what to do with this now. I wanted to build myself a little minimalistic web interface so I can do a text/keyword search.

    The problem with reddit's API is that it only shows 1000 entries over 10 pages in every api. Meaning hot/top/new, and search results are limited. If you have more links related to the keyword, you won't discover more.

    So you need a very specific keyword list to be able to discover more posts.

    [1] https://github.com/cookiengineer/reddit-archivar

  • Show HN: Reddit Archiving Tool
    1 project | news.ycombinator.com | 10 Jun 2023
    Inspired by the ongoing call-to-action by the Internet Archive team over at /r/DataHoarder [1], I've decided I want to try to preserve all cybersecurity related subreddits. [2]

    For people that don't know what's going on: There's a likelihood that the try to monetize the Reddit API will lead to a lot of moderators quitting the platform, and it could be that a lot of subreddits are going to be set on private and/or their threads are going to be deleted. At least that's kind of the fear from the ongoing moderator strike.

    In my case I learned a LOT from reddits' discussions about malware, exploits and how they work, and without those I certainly wouldn't be where I am today ... so I'm trying to preserve them.

    As the Archive Warrior only scrapes the HTML directly to the Web Archive, I'm trying to preserve the data itself directly as JSON files; with intent to store it later on IPFS (having been inspired a couple days ago by the-eye-team's effort to archive RARBG on IPFS).

    I just wanted to let people know here about the tool, and in case you want to archive your favorite subreddits, feel free to modify it.

    There are some limitations though, because listings (new/hot/top/search) are all limited to 1000 entries, which means that the discovery of old threads is quite limited.

    Keyword search increases the discovery of old threads. In my case I'm searching for a lot of keywords (like CVE, RCE, vulnerability etc) in order to discover more threads.

    Would love to hear feedback, currently it's just a prototypical quick n' dirty tool because the threat of my favorite subreddits going dark is quite immediate. I tried to reduce as much noise from the schema as possible, and the tool is only archiving the subreddit threads and comments, with the idea to be able to scrape the websites/blog articles at a later point in time.

    [1] https://old.reddit.com/r/DataHoarder/comments/142l1i0/archiveteam_has_saved_over_108_billion_reddit/

    [2] https://github.com/cookiengineer/reddit-archivar

  • ArchiveTeam has saved over 10.8 BILLION Reddit links so far. We need YOUR help running ArchiveTeam Warrior to archive subreddits before they're gone indefinitely after June 12th!
    9 projects | /r/DataHoarder | 6 Jun 2023

What are some alternatives?

When comparing RedditLemmyImporter and reddit-archivar you can also consider the following projects:

lemmy-js-client - A javascript / typescript http and websocket client and type system for Lemmy.

reddit-grab - Grabbing everything from reddit.

PushshiftDumps - Example scripts for the pushshift dump files

reddit-items - Managing items for reddit-grab.

spring-reddit-clone - Reddit clone built using Spring Boot, Spring Security with JWT Authentication, Spring Data JPA with MySQL, Spring MVC. The frontend is built using Angular - You can find the frontend source code here - https://github.com/SaiUpadhyayula/angular-reddit-clone

BotIt - A bot that scrapes posts from an specific subreddit and posts it on a kbin magazine.

warrior-dockerfile - A Dockerfile for the ArchiveTeam Warrior

export-saved-reddit - Export saved Reddit posts into a HTML file for import into Google Chrome.

PRAW - PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

Lemmy - 🐀 A link aggregator and forum for the fediverse

jerboa - A native android app for Lemmy