warcprox
brozzler
warcprox | brozzler | |
---|---|---|
7 | 2 | |
363 | 630 | |
1.1% | 0.3% | |
6.4 | 8.3 | |
7 months ago | 16 days ago | |
Python | Python | |
- | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
warcprox
-
Offpunk 2.0
I've looked into archiving all the pages i visit as well and warcprox[1] has been bookmarked for a while now
Hard drive storage space being so cheap in the ~$15/TB range makes this more feasible even for video archival
[1] https://github.com/internetarchive/warcprox
- What is warcprox ?
-
r18 database of metadata
wget also supports WARC options if you don't need javascript etc. If you do, there's also Warcprox (https://github.com/internetarchive/warcprox), brozzler (https://github.com/internetarchive/brozzler) (which uses warcprox internally), and others.
- [HELP] I´m looking for some self-hosted solution where users can connect to the website, connect to a page, browse it and save it.
-
tofuproxy – web proxy, TLS terminator, X.509 TOFU manager, WARC/gemini browser
Wow, it's really rare these days to see a tool that supports WARC.
Despite being an ISO standard [1] and the default archive format of the internet archive, and despite a handfull of lovingly crafted tools (such as webrecorder [2], warcprox etc.), it never seems to have caught on in a broader context.
Really a shame - I' deeply convinced that the ability to archive and replay requests is a technique for defending and strengthening user rights.
Links:
[1] https://www.iso.org/standard/44717.html
[2] https://github.com/webrecorder/webrecorder-desktop
[3] https://github.com/internetarchive/warcprox
- Browser Extension for Saving Images As While Browsing
-
How to archive the tweets and replies of my own terminated twitter account(s)
However, if you prefer something open-source, you could accomplish the same thing with a tool like https://archiveweb.page/ or https://github.com/internetarchive/warcprox
brozzler
-
r18 database of metadata
wget also supports WARC options if you don't need javascript etc. If you do, there's also Warcprox (https://github.com/internetarchive/warcprox), brozzler (https://github.com/internetarchive/brozzler) (which uses warcprox internally), and others.
-
Is there a way to archive groups of webpages similarly to how web archive does it?
Actually, the IA uses Brozzler (https://github.com/internetarchive/brozzler) now if I remember correctly.
What are some alternatives?
replayweb.page - Serverless replay of web archives directly in the browser
heritrix3 - Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
TWINT - An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
archiveweb.page - A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
webrecorder-desktop - Webrecorder Desktop App!
auto-save-html - Firefox extension that automatically dumps HTML when browsing a specified site
conifer - Collect and revisit web pages.
mpiv - A fully reworked fork of Mouseover Popup Image Viewer
webcrystal - An archiving HTTP proxy and on-disk archival format for websites.