wayback-machine-downloader vs wayback

wayback-machine-downloader

Download an entire website from the Wayback Machine. (by hartator)

Suggest topics

Source Code

Suggest alternative

Edit details

wayback

IA's public Wayback Machine (moved from SourceForge) (by internetarchive)

Suggest topics

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

wayback-machine-downloader		wayback
	Project
48	Mentions	11
5,053	Stars	710
-	Growth	0.7%
0.0	Activity	0.0
3 months ago	Latest Commit	2 months ago
Ruby	Language	Java
GNU General Public License v3.0 or later	License	-

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

wayback-machine-downloader

Posts with mentions or reviews of wayback-machine-downloader. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-22.

Ask HN: Cool Useful GitHub Repos?
6 projects | news.ycombinator.com | 22 Dec 2023

I just found this https://github.com/hartator/wayback-machine-downloader
anyone have anything similarly interesting/cool/niche-useful ?
ArchiveTeam is saving Blogger from Google deletion
1 project | news.ycombinator.com | 26 Nov 2023

Send ArchiveTeam the link on IRC or here and we can save it to archive.org, then later you can use wayback-machine-downloader to grab it from archive.org.
https://github.com/hartator/wayback-machine-downloader
My TikTok was Hacked & Deleted and I GOT IT BACK!
1 project | /r/Tiktokhelp | 7 Nov 2023

This is where it gets tricky, you need to download the code from the wayback machine and he was able to do that by following these steps: https://github.com/hartator/wayback-machine-downloader
Is there a way to quick download twitter images on the wayback machine?
1 project | /r/DataHoarder | 8 Aug 2023

Not sure if it will work for twitter, but I have used wayback-machine-downloader to batch download stuff.
Forgot to backup my WordPress files before I swapped webhosting provider, am I screwed?
1 project | /r/Wordpress | 25 Jun 2023

Adding to archive.org, there is a github repo to fetch website data. You can give a try too. Here is the repo link
Can I please get help downloading and saving a website for offline use?
1 project | /r/DataHoarder | 3 Jun 2023
Hey guys, looks like we have a potential hacker on our hands. All of our company's files were deleted from our FTP. :( Is there any way we can get a cache of our website and restore everything? Any help or advice would be greatly appreciated. Thanks in advance!
1 project | /r/webhosting | 6 May 2023

Edit: Good news! I found a solution that saved me. I was able to download the full website (including images, JS, and CSS files) using this tool: https://github.com/hartator/wayback-machine-downloader
Hey guys, so a potential hacker managed to delete all of our company's files from our FTP. Yikes! Is there a way to retrieve a cache of our website and restore it? Any advice or tips would be greatly appreciated. Thanks in advance!
1 project | /r/webhosting | 5 May 2023

Edit: Thank you to everyone who suggested the Wayback Machine Downloader! It saved the day and allowed me to download the full website, including images, JS, and CSS files.
Have a lengthy flight: how to seamlessly mirror couple websites
2 projects | /r/linux | 26 Apr 2023

I've used https://github.com/hartator/wayback-machine-downloader but it sometimes messes up CSS badly
what Do YOU Recommend?
2 projects | /r/hacking | 20 Apr 2023

wayback

Posts with mentions or reviews of wayback. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-01-28.

Scraping Data From Past: A Step-by-Step Tutorial
1 project | /r/webscraping | 19 Jul 2023

In this tutorial, we will explore how to scrape data from the past using the Wayback Machine API. We'll be using Python and the requests library to make HTTP requests and retrieve archived versions of web pages. The code provided demonstrates a basic implementation of scraping historical data from a list of URLs within a specified date range.
Subdomain * wildcard search
1 project | /r/WaybackMachine | 8 Feb 2023

A bit of context: I do a lot of archive digging with Apple's website. For the longest time, they hosted large files through Akamai. Most URLs looked something like this: http://a2032.g.akamai.net/5/2032/51/6cafb32dc21f74/1a1a1aaa2198c627970773d80669d84574a8d80d3cb12453c02589f25382f26493036bda4ebd305fd241a71b92f365ca/appleworks62_box.eps.hqx Unfortunately, those files shifted around from subdomain to subdomain (one period of time it was under a2032.g.akamai.net, another might be a1008.g.akamai.net) so finding all copies of a specific file was a pain in the ass. I recently learned that the IA has an API for the Wayback's Server that allows way more filtering than the web UI does. So to find every *.g.akamai.net URL they have archived, I used: http://web.archive.org/cdx/search/cdx?url=*.g.akamai.net/*
Managed to work with the Waybackmachine-API to get a backup of a much loved site
1 project | /r/DataHoarder | 6 Feb 2023

For the following step, you will need the Wayback Machine's CDX API, the documentation is here:https://github.com/internetarchive/wayback/tree/master/wayback-cdx-serverHowever, please note that there are errors in the documentation regarding the regex filtering syntax.
Take More Screenshots
9 projects | news.ycombinator.com | 28 Jan 2023

archive.org geocities scrapes go back to 1996, so it is plausible it could have survived:
https://web.archive.org/cdx/search/cdx?url=geocities.com&mat...
If you ever remember any of the details, the CDX API can probably help.
https://github.com/internetarchive/wayback/blob/master/wayba...
Is there any way to go further into results for pages with more than 10,000 captures?
1 project | /r/WaybackMachine | 25 Jan 2023
Ask HN: How do RSS readers handle items missing pubDates?
2 projects | news.ycombinator.com | 8 Jun 2022

Query the Internet Archive’s CDX server for this info.
https://github.com/internetarchive/wayback/blob/master/wayba...
Web scraping from https://web.archive.org/ (wayback machine)
1 project | /r/webscraping | 19 Jan 2022

Archive.org has a cdx server you can quickly request information from: https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server
Possible to download a file from archive.org?
1 project | /r/DataHoarder | 3 Oct 2021

Are the contents of these WARCs available as part of the Wayback Machine itself? If so, you might be able to use the CDX server to discover and download the content.
Wayback Machine Downloader – Download an Entire Website from the Wayback Machine
8 projects | news.ycombinator.com | 11 Jul 2021
easy way to get images off wayback machine?
1 project | /r/OSINT | 23 Jun 2021

Wayback API to get a list of all versions of the page (https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server#basic-usage).

What are some alternatives?

When comparing wayback-machine-downloader and wayback you can also consider the following projects:

savepagenow - A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service

wayback-machine-spn-scripts - Bash scripts which interact with Internet Archive Wayback Machine's Save Page Now

warrick - Recover lost websites from the Web Infrastructure

ArchiveBox - 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

neocities - Neocities.org - the web site. The entire thing. Yep, we're completely open source.

Hexo - A fast, simple & powerful blog framework, powered by Node.js.

waybackpack - Download the entire Wayback Machine archive for a given URL.

gba-remote-play - 📡 Stream Raspberry Pi games to a GBA via Link Cable.

electron-vlog - Take video recordings, screenshots and time-lapses of your Electron app with ease