22120
pywb
Our great sponsors
22120 | pywb | |
---|---|---|
13 | 7 | |
2,638 | 1,300 | |
- | 1.8% | |
9.7 | 4.8 | |
over 2 years ago | 10 days ago | |
JavaScript | JavaScript | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
22120
-
Is there a browser addon which locally archives every website I visit?
Here. An archivist browser controller that caches everything you browse, a library server with full text search to serve your archive.
- Show HN: Irchiver, your full-resolution personal web archive
-
Ask HN: Full text search engine in JavaScript for English and and Chinese?
Following your "hilarious" and disrespectful answer here https://github.com/i5ik/22120/issues/63#issuecomment-7275272..., I would prefer that you remove any reference to SingleFile in the description of your project. I could not open an issue because you blocked me. And please don't accuse people without proof.
- 22120: self-host the Internet with an Offline Archive. Similar to ArchiveBox, SingleFile and WebMemex. Works well with WorldBrain/Memex to give you full-text search. Why not WARC? Uses Chrome DevTools protocol to intercept all requests, and caches responses against a key of (method, URL)
-
Request: Proxy caching all visited websites text in DB, making history searchable
https://github.com/i5ik/22120 is a tool that archives as you browse that you can then view offline later
- Is the there a way I can cache videos(reddit.4chan) I watch in browser (Linux)?
-
So you want to write a GUI framework
My solution to this (it's been done before), is to use the existing browser engine (not the system webview) installed. So far I only utilize Chrome, but as the way I connect to it is over the Chrome DevTools protocol which is somewhat fluent with the Remote Debugging Protocol[0] that Firefox is doing, this is a reasonable approach.
So far my "tool" to do this is simply a template repository with some conveniences, providing in essence a skeleton for these types of apps. I hope to flesh this out a little more, and expose a much richer API, as well as convert some of my existing popular apps (like 22120[1]) to the "framework".
The benefit of this is Graderjs has a built in 'app builder' that can create a cross-platform binary (excluding or ignoring the necessity (on MacOS) and near-necessity (on Windows) to sign your executable somehow, that lets you display your UI in JS/HTML/CSS using the already installed browser engine, as well as run code in NodeJS and using the rich APIs[2] of the browser engine itself. I'm really happy with this project and think that, even tho it's small now, it will in time become my most popular and powerful one: even bigger than my remote browser and popular web archiver.
Just give it time! :)
[0]: https://firefox-source-docs.mozilla.org/remote/index.html
[1]: https://github.com/i5ik/22120
[2]: https://chromedevtools.github.io/devtools-protocol/tot/Brows...
The GraderJS: https://github.com/i5ik/graderjs
-
Ask HN: Why saving webpages on hard disk has not got better?
I use this to backup pages automatically
https://github.com/i5ik/22120
-
Saving all browsed websites automatically
Does this potentially help? https://github.com/c9fe/22120
-
Make Your Own Internet Archive with Archive Box
From the blog comments, I think this is what you’re after https://github.com/c9fe/22120
pywb
-
Is there any good software for deduping (deduplicating) content in WARC files?
I have thousands of bookmarks on raindrop.io that I've been wanting to archive for a while. However, I've archived ~150 pages so far with Pywb and it ended up being 500MB across two WARCs, even with the dedupe setting specified in my settings file. It dedupes while archiving pages. I want software to get any spots missed and be sure that WARCs are actually deduped.
-
Is there a way to easily and reliably SSH to my laptop no matter what wifi the laptop is connected to? I have no clue.
I don't know if the solution would be related or relevant to this, but I would also want to be able to remotely launch and access a web server, Pywb, on Safari on my iPad, also no matter what wifi I'm on. On a Mac, it would be launched with the command wayback and the server would be accessed on the Browser with localhost:8080.
-
I can't install a Python package, pywb, looks like a problem with brotlipy. What can I do?
Check their github site. I would try "git clone https://github.com/webrecorder/pywb `
-
Purevolume archives?
I've been trying to open those large warc files these days. I've tried webrecorder, replayweb, pywb and warcat before but none of these worked well for me.
-
Ran grab-site now have some warc.gz files etc, the site in question was originally hosted in a mixture of html and javascript, what's the best and easiest way to make this accessible as a user for offline personal use?
pywb, but it requires creating a full copy of the data: https://github.com/webrecorder/pywb/issues/408
-
How good is ArchiveWeb.page?
I found it to be good with loading small WARCs quickly, but it can longer if the WARC is larger. Webarchive player, while it's old and discontinued, I've found it work better than Webrecorder Player and replayweb.page. If you want newer software to replay WARCs, try Pywb. I find it to be the best WARC player.
-
Saving all browsed websites automatically
I use pywb in proxy recording mode.
What are some alternatives?
ArchiveBox - 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
conifer - Collect and revisit web pages.
asciidoctor-latex - :triangular_ruler: Add LaTeX features to AsciiDoc & convert AsciiDoc to LaTeX
warcio - Streaming WARC/ARC library for fast web archive IO
SingleFile - Web Extension for saving a faithful copy of a complete web page in a single HTML file
awesome-selfhosted - A list of Free Software network services and web applications which can be hosted on your own servers
notes - A zero dependency shell script that makes it really simple to manage your text notes.
replayweb.page - Serverless replay of web archives directly in the browser
DownloadNet - 💾 DownloadNet - All content you browse online available offline. Search through the full-text of all pages in your browser history. ⭐️ Star to support our work!
SingleFileZ - Web Extension to save a faithful copy of an entire web page in a self-extracting ZIP file
linux-surface - Linux Kernel for Surface Devices
webarchiveplayer - NOTE: This project is no longer being actively developed.. Check out Webrecorder Player for the latest player. https://github.com/webrecorder/webrecorderplayer-electron) (Legacy: Desktop application for browsing web archives (WARC and ARC)