This Reddit Community Has Been Archived

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

PushshiftDumps

40 240 8.1 Python

Example scripts for the pushshift dump files

how I read the file? First I got tried to extrat the file ok I got it, but them I text file I can't read that., I saw a few people saing it was just a json file I tried with a json reader but it say the json data is invalid, them I tried this program but nothing happens no new file is created or something, here a print, maybe I'm doing something wrong but I don't know because the script don't have any instruction how to use it!

RedditScrape

2 79 6.6 Python

Quick and dirty script to suck down the pr0n from Reddit before it's too late

If you only want to back up stuff from specific subs, you could use RedditScrape. It queries the official PushShift API to get posts and then downloads them via gallery-dl. I have been running it for almost 14 h now and downloaded 187k media (227 GiB) from a few subs that interest me. Might be getting rate limited by now, though I've been using a vpn so I could just switch location if really necessary.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
redarcs-reader

1 2 4.9 Python
reddit-html-archiver

12 165 1.8 Python

archive reddit data as offline friendly web pages

Well done, now you should make it sane. No need to reinvent the wheel here. Just rewrite reddit-html-archiver to use the raw json from redarcs rather than the pushshift api.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Open-source SDK for adding custom code interpreters to AI apps

2 projects | news.ycombinator.com | 2 May 2024
Show HN: SpRAG – Open-source RAG implementation for challenging real-world tasks

1 project | news.ycombinator.com | 2 May 2024
Show HN: Local GLaDOS

1 project | news.ycombinator.com | 2 May 2024
Let's Build An AI Agent: trendrBOT answers questions about Google Search trends

1 project | news.ycombinator.com | 2 May 2024
NPi – An Open Source project for enhancing AI Agents in taking action

1 project | news.ycombinator.com | 2 May 2024

This Reddit Community Has Been Archived

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder Post date: 3 May 2023

PushshiftDumps

RedditScrape

InfluxDB

redarcs-reader

reddit-html-archiver

Related posts

Open-source SDK for adding custom code interpreters to AI apps

Show HN: SpRAG – Open-source RAG implementation for challenging real-world tasks

Show HN: Local GLaDOS

Let's Build An AI Agent: trendrBOT answers questions about Google Search trends

NPi – An Open Source project for enhancing AI Agents in taking action