SaaSHub helps you find the best software and product alternatives Learn more →
Top 3 web-archive Open-Source Projects
-
DownloadNet
💾 DownloadNet - All content you browse online available offline. Search through the full-text of all pages in your browser history. ⭐️ Star to support our work!
-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
-
browsertrix
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
Project mention: ArchiveBox: Open-source self-hosted web archiving | news.ycombinator.com | 2024-01-11For anyone who uses Chrome and wants to view their archived pages in the browser as if they were still online (URL and everything intact), and also full-text search through their browsing history that was archived (like AB plans to add in future, I think, right nikki?) you can check out DownloadNet: https://github.com/dosyago/DownloadNet
You can have multiple archives, and even use a mode where you only archive pages you bookmark rather than everything.
Project mention: Ask HN: How can I back up an old vBulletin forum without admin access? | news.ycombinator.com | 2024-01-29You can try https://replayweb.page/ as a test for viewing a WARC file. I do think you'll run into problems though with wanting to browse interconnected links in a forum format, but try this as a first step.
One potential option but definitely a bit more work would be, once you have all the warc files downloaded, you can open them all in python using the warctools module and maybe beautifulsoup and potentially parse/extract all of the data embedded in the WARC archives into your own "fresh" HTML webserver.
https://github.com/internetarchive/warctools
web-archive related posts
-
You're Gonna Need a Bigger Browser
-
Google Chrome pushes browser history-based ad targeting
-
Webrecorder: Capture interactive websites and replay them at a later time
-
Show HN: DiskerNet – Browse the Internet from Your Disk, Now Open Source
-
Show HN: DiskerNet – Browse the Internet from Your Disk, Now Open Source
-
phpBB3 forum owner dead. Webhost purging soon. Need to quickly archive a site
-
Is there such a thing as a " Master Search Engine " for desktops and websites that can search for any keyword on the site and on the PC?
-
A note from our sponsor - SaaSHub
www.saashub.com | 4 May 2024
Index
What are some of the best open-source web-archive projects? This list will help you:
Project | Stars | |
---|---|---|
1 | DownloadNet | 3,648 |
2 | replayweb.page | 620 |
3 | browsertrix | 123 |
Sponsored