SingleFile: Save a Complete Web Page into a Single HTML File

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SingleFile

    Web Extension for saving a faithful copy of a complete web page in a single HTML file

    I confirm that you could use a headless browser for this. This is actually what SingleFile CLI does [1]. Here is an example of JS code showing how to configure and inject SingleFile with puppeteer [2].

    [1] https://github.com/gildas-lormeau/SingleFile/tree/master/cli

    [2] https://github.com/gildas-lormeau/SingleFile/blob/master/cli...

  • monolith

    ⬛️ CLI tool for saving complete web pages as a single HTML file

    Nice project! Your project, and a similar project called Monolith[0], was a bit of an inspiration for making my own single HTML file tool called Humble[1] to solve a few edges cases I was having with bundling pages (and since I wanted a TypeScript API for making page bundles).

    [0] https://github.com/Y2Z/monolith

    [1] https://github.com/assemblylanguage/humble

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

  • obelisk

    Go package and CLI tool for saving web page as single HTML file (by go-shiori)

  • SingleFile-MV3

    SingleFile version compatible with Manifest V3. The future, right now!

    Should be noted Manifest V3 will break this extension for chromium based browsers.

    https://github.com/gildas-lormeau/SingleFile-Lite

  • firefox-scrapbook

    ScrapBook X – a legacy Firefox add-on that captures web pages to local device for future retrieval, organization, annotation, and edit.

    Related: I used to keep a collection of locally mirrored web pages a long time ago, with a legendary Firefox extension called ScrapBook [0] (now long retired). The surprise for me is that after all these years I still remembered the name...

    While writing this comment I found that it lived on as a (now "legacy") new extension named ScrapBook X [1], and then yet another one named WebScrapBook [2], which seems to still be alive!

    [0]: www.xuldev.org/scrapbook/

    [1]: https://github.com/danny0838/firefox-scrapbook

    [2]: https://addons.mozilla.org/en-US/firefox/addon/webscrapbook/

  • SingleFileZ

    Web Extension to save a faithful copy of an entire web page in a self-extracting ZIP file

    Images are stored as data URIs [1]. Note that they could also be stored as entries in a zip file too! [2].

    [1] https://en.wikipedia.org/wiki/Data_URI_scheme

    [2] https://github.com/gildas-lormeau/SingleFileZ

  • zotero

    Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share your research sources.

    That's a nice and simple tool, good work. I'm personally using Zotero to save copies of web pages: https://www.zotero.org/. With the browser extension you can save a snapshot in a few seconds.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • trailcap

    Strip non-presentational content out of HTML pages

    Thanks for this project. I found SingleFile a year or two ago, and used it to take "HTML Screenshots" of third party sites I could embed in guided walkthroughs with modified/example data changed, instead of just PNGs.

    SingleFile was ultra-valuable for this.

    If anyone has a similar use-case, I wrote some pretty rough (and slow) code to post-process SingleFile's output to remove any HTML that wasn't contributing to the presentational render by launching puppeteer and comparing pixels. It's available here: https://github.com/mieko/trailcap

  • webextensions

    Charter and administrivia for the WebExtensions Community Group (WECG)

    AFAIK, for the moment Mozilla is aware of the regressions that Manifest V3 causes and shows a good will to try to reduce them as much as possible. You can find some information about this here https://github.com/w3c/webextensions/tree/main/_minutes

  • WebKit

    Home of the WebKit project, the browser engine used by Safari, Mail, App Store and many other applications on macOS, iOS and Linux.

    I know that WebKit relies on either libsoup [1] (on Linux/Unices) or curl [2] (legacy Windows and maybe WPE(?)) as a network adapter, so the header handling and parsing mechanisms would have to be implemented in there.

    Don't know about chromium (my knowledge is ~2012ish about their architecture, and pre-Blink).

    [1] https://github.com/WebKit/WebKit/tree/main/Source/WebKit/Net...

    [2] https://github.com/WebKit/WebKit/tree/main/Source/WebKit/Net...

  • ekeko

    Ekeko is a tool that helps you save all of your favorited memes, videos and other online resources.

    I'm building a tool for people have a personal archive to their digital life so that 30 years from now they can revisit content they enjoyed in their younger years.

    https://github.com/sergiotapia/ekeko

    This is awesome! I would love to integrate this somehow into my project to "singlefile" bookmarks as people make them.

    @gildas do you have any recommendation on how to approach this with your extension? Could I run a headless chrome and trigger this extension?

  • ScreenToGif

    🎬 ScreenToGif allows you to record a selected area of your screen, edit and save it as a gif or video.

    I used:

    - ScreenToGif to record video sequences and produce the final GIF: https://www.screentogif.com/

  • awesome-web-archiving

    An Awesome List for getting started with web archiving

  • wombat

    Wombat.js client-side rewriting library (by webrecorder)

    The way that sites like Wayback Machine handle this is by using the web-replay library Wombat https://github.com/webrecorder/wombat that also uses JS to insert those elements.

    But what the hell! I was working on a similar html-downloading/reproducing tool and this bug really bothers me. I'd either like the HTML reading standard to be updated to accept

  • ArchiveBox

    🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

  • wabac.js-1.0

    WARC is also used by the Webrecorder project. They made an app called Wabac which does entirely client-side WARC or HAR replays using service workers and it seems to have pretty good browser support, but I haven't really dug into the specifics.

    https://github.com/webrecorder/wabac.js-1.0

  • cairn

    NPM package and CLI tool for saving web page as single HTML file

  • wayback

    A bot for Telegram, Mastodon, Slack, and other messaging platforms archives webpages.

    Similar approaches were proposed at https://github.com/wabarc/wayback

  • screenshot

    Capture webpage and save as image using chromedp (by wabarc)

    There is a project that uses a headless browser to implement HAR.

    https://github.com/wabarc/screenshot

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts