wayback-machine-scraper
ArchiveBox
Our great sponsors
wayback-machine-scraper | ArchiveBox | |
---|---|---|
6 | 2 | |
405 | 8,085 | |
- | - | |
0.0 | 9.7 | |
2 months ago | over 3 years ago | |
Python | Python | |
ISC License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
wayback-machine-scraper
- wayback-machine-scraper: NEW Data - star count:380.0
- Anyone have a simple useful guide so I can get this scraper working .
- wayback-machine-scraper: NEW Data - star count:295.0
- Anyone have a simple useful guide so I can get this scraper working?
-
Retrieving images from archived pages?
You can very easily scrape pages from the web archive with this small package: wayback-machine-scraper. Getting historic snapshots of a webpage becomes a matter of a one-liner like:
-
How can I get my old blog back (Wordpress)?
There are tools that allow you to scrape websites archived by the Wayback Machine. Like this one for example: https://github.com/sangaline/wayback-machine-scraper
ArchiveBox
- An Emacs wallabag client - the Emacser way to manage web pages!
-
Make Your Own Internet Archive with Archive Box
it doesn't show in the Screenshot in the article, but ArchiveBox in Aug 2020 implemented the "readability article text extractor", see description in the release notes: https://github.com/pirate/ArchiveBox/releases/tag/v0.4.14 and the module that does the work https://github.com/pirate/readability-extractor
By only extracting text and article images you could go deep into an archive. If you skip images, much more so
What are some alternatives?
waybackpy - Wayback Machine API interface & a command-line tool
Wallabag - wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.
cancel-culture - Tools for fighting abuse on Twitter
youtube-dl-webui - Another webui for youtube-dl powered by Flask.
autoscraper - A Smart, Automatic, Fast and Lightweight Web Scraper for Python
archivy - Archivy is a self-hostable knowledge repository that allows you to learn and retain information in your own personal and extensible wiki.
ArchiveBox - 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
pinboard-notes-backup - Back up the notes you’ve saved to Pinboard
WordPress - WordPress, Git-ified. This repository is just a mirror of the WordPress subversion repository. Please do not send pull requests. Submit pull requests to https://github.com/WordPress/wordpress-develop and patches to https://core.trac.wordpress.org/ instead.
promnesia - Another piece of your extended mind
grasp - A reliable org-capture browser extension for Chrome/Firefox
wallabag.el - Emacs wallabag client - A Read It Later/Web Archiving Solution in Emacs.