Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
You can also run this on Kubernetes! There's a manifest in the warrior-dockerfile repo. I modified the manifest to do away with the nodeport and use an ingress instead:
https://github.com/ArchiveTeam/reddit-grab <- source code
There are a lot more items that are waiting to be queued into the tracker (approximately 758 million), so 150 million is not an accurate number. This is due to Redis limitations - the tracker is a Ruby and Redis monolith that serves multiple projects with around hundreds of millions of items. You can see all the Reddit items here.
Last time I tried I wasn't able to save old.reddit style comment sections on archive.org, meaning most comments aren't saved due to "show more replies" hell, so I would recommend https://archive.is/ since it will always automatically save your page with the old.reddit layout no matter the url.