wayback-machine-scraper vs waybackpy

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

wayback-machine-scraper		waybackpy
	Project
6	Mentions	6
405	Stars	405
-	Growth	-
0.0	Activity	0.0
2 months ago	Latest Commit	over 1 year ago
Python	Language	Python
ISC License	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

wayback-machine-scraper

Posts with mentions or reviews of wayback-machine-scraper. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-05-20.

wayback-machine-scraper: NEW Data - star count:380.0
1 project | /r/algoprojects | 10 Dec 2023
Anyone have a simple useful guide so I can get this scraper working .
1 project | /r/github | 6 Nov 2022
wayback-machine-scraper: NEW Data - star count:295.0
1 project | /r/algoprojects | 26 Jun 2022
Anyone have a simple useful guide so I can get this scraper working?
1 project | /r/github | 6 Nov 2021
Retrieving images from archived pages?
1 project | /r/webscraping | 26 Jun 2021

You can very easily scrape pages from the web archive with this small package: wayback-machine-scraper. Getting historic snapshots of a webpage becomes a matter of a one-liner like:
How can I get my old blog back (Wordpress)?
2 projects | /r/Wordpress | 20 May 2021

There are tools that allow you to scrape websites archived by the Wayback Machine. Like this one for example: https://github.com/sangaline/wayback-machine-scraper

waybackpy

Posts with mentions or reviews of waybackpy. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-05.

download all captures of a page in archive.org
2 projects | /r/Archiveteam | 5 Jun 2023

I ended up using waybackpy python module to retrieve archived URLs, it worked well. I think the feature you want for this is the "snapshots", but I didn't test this myself
Well worth the price
1 project | /r/OSINT | 9 Apr 2023

I have read it as someone told me that it has a section for using waybackpy, a tool/library that I wrote and maintain.
any way to archive all my bookmarks on archive.org?
1 project | /r/DataHoarder | 6 Oct 2022
Comex update 9/27/2022
1 project | /r/wallstreetplatinum | 27 Sep 2022

import requests from datetime import datetime from pathlib import Path from waybackpy import WaybackMachineSaveAPI #https://github.com/akamhy/waybackpy ARCHIVE = False LOCAL_SAVE = True #https://www.cmegroup.com/clearing/operations-and-deliveries/nymex-delivery-notices.html urls = [ #COMEX & NYMEX Metal Delivery Notices "https://www.cmegroup.com/delivery_reports/MetalsIssuesAndStopsReport.pdf", "https://www.cmegroup.com/delivery_reports/MetalsIssuesAndStopsMTDReport.pdf", "https://www.cmegroup.com/delivery_reports/MetalsIssuesAndStopsYTDReport.pdf", #NYMEX Energy Delivery Notice "https://www.cmegroup.com/delivery_reports/EnergiesIssuesAndStopsReport.pdf", "https://www.cmegroup.com/delivery_reports/EnergiesIssuesAndStopsYTDReport.pdf", #Warehouse & Depository Stocks "https://www.cmegroup.com/delivery_reports/Gold_Stocks.xls", "https://www.cmegroup.com/delivery_reports/Gold_Kilo_Stocks.xls", "https://www.cmegroup.com/delivery_reports/Silver_stocks.xls", "https://www.cmegroup.com/delivery_reports/Copper_Stocks.xls", "https://www.cmegroup.com/delivery_reports/PA-PL_Stck_Rprt.xls", "https://www.cmegroup.com/delivery_reports/Aluminum_Stocks.xls", "https://www.cmegroup.com/delivery_reports/Zinc_Stocks.xls", "https://www.cmegroup.com/delivery_reports/Lead_Stocks.xls" ] user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36' #required for both wayback and cmegroup.com headers = {'User-Agent': user_agent} #present yourself as an updated Chrome browser if ARCHIVE: for url in urls: filename = url.split("/")[-1] print(f"Archiving {filename} on Wayback Machine...") save_api = WaybackMachineSaveAPI(url, user_agent) #limited to 15 requests / minute / IP. My VPN IP was already throttled :( Couldn't even get this to work with normal IP. Returned 429 error.... res = save_api.save() print(f"Res: {res}") if LOCAL_SAVE: datestr = datetime.now().strftime('%m-%d-%Y') datedir = Path(datestr) datedir.mkdir(exist_ok=True) for url in urls: filename = url.split("/")[-1] print(f"Fetching {filename}...") try: resp = requests.get(url, timeout=3, allow_redirects=True, headers=headers) if resp.ok: filepath = datedir / filename if not filepath.exists(): with open(filepath, mode="wb") as f: f.write(resp.content) else: print(f"ERROR: Filepath already exists: {filepath}") else: print(f"ERROR: response for {filename}: {resp}") except requests.ReadTimeout: print("timeout")
Is there a way to download all the files Internet Archive has captured for a domain? I am trying to recover tweets from a suspended twitter account, but the account as a whole was never captured in the Wayback Machine, just some individual tweets and json files.
1 project | /r/DataHoarder | 27 Jun 2021
简单run个脚本使用 wayback machine 接口批量备份知乎问题冲塔回答
1 project | /r/CLTV | 29 Apr 2021

一个封装 wayback machine 接口的 package， github地址：https://github.com/akamhy/waybackpy

What are some alternatives?

When comparing wayback-machine-scraper and waybackpy you can also consider the following projects:

cancel-culture - Tools for fighting abuse on Twitter

TikUp - An auto downloader and uploader for TikTok videos.

autoscraper - A Smart, Automatic, Fast and Lightweight Web Scraper for Python

wayback - A bot for Telegram, Mastodon, Slack, and other messaging platforms archives webpages.

ArchiveBox - 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

WordPress - WordPress, Git-ified. This repository is just a mirror of the WordPress subversion repository. Please do not send pull requests. Submit pull requests to https://github.com/WordPress/wordpress-develop and patches to https://core.trac.wordpress.org/ instead.

wayback_archiver - Ruby gem to send URLs to Wayback Machine

ArchiveBox - 🗃 The open source self-hosted web archive. Takes browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more... [Moved to: https://github.com/ArchiveBox/ArchiveBox]

pywb - Core Python Web Archiving Toolkit for replay and recording of web archives

wayback-machine-scraper vs cancel-culture waybackpy vs TikUp wayback-machine-scraper vs autoscraper waybackpy vs wayback wayback-machine-scraper vs ArchiveBox waybackpy vs ArchiveBox wayback-machine-scraper vs WordPress waybackpy vs wayback_archiver waybackpy vs ArchiveBox waybackpy vs pywb

Compare wayback-machine-scraper vs waybackpy and see what are their differences.

wayback-machine-scraper

waybackpy

wayback-machine-scraper

waybackpy

What are some alternatives?