Ask HN: Full-text browser history search forever?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • ArchiveBox

    🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

  • You can use Archivebox. Set it to grab URLs from your browser history database and it will archive them all to disk in whatever formats you want. You can then use whatever tools you want on those local files.

    https://archivebox.io/

  • SingleFile

    Web Extension for saving a faithful copy of a complete web page in a single HTML file

  • Not exactly what you're asking for but you can setup SingleFile[1] to automatically save each page you visit. Then there's also ArchiveBox[2] which can convert your browser history into various formats.

    [1] https://github.com/gildas-lormeau/SingleFile

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • falcon

    Firefox extension for full text history search! (by CennoxX)

  • In Chrome, you can install it from source:

    https://github.com/CennoxX/falcon#transparent-installation

    As for Firefox, you can right-click the "Add to Firefox" button and save and inspect the extension.

  • DownloadNet

    💾 DownloadNet - All content you browse online available offline. Search through the full-text of all pages in your browser history. ⭐️ Star to support our work!

  • Hey. My project Diskernet does this: full text search over browser history.

    Put it in "save" mode when using Chrome (linux is fine) and it automatically saves every page you browse (so you can read it offline), and also indexes it for full text search. It's a work in progress and there are bugs (so my advice initialize a git repo in your archive directory, and make regular syncs to a remote in case of failure -- that also gives you a nice snapshotted archive).

    Anyway, best of luck to you! :)

    Diskernet: https://github.com/crisdosyago/Diskernet

  • activitywatch

    The best free and open-source automated time tracker. Cross-platform, extensible, privacy-focused.

  • If it’s just the metadata that you want, you can use activity watcher [1]. They have a browser plug-in.

    [1] https://activitywatch.net/

  • min

    A fast, minimal browser that protects your privacy

  • nyxt

    Nyxt - the hacker's browser.

  • Nyxt browser is doing this pretty well! https://nyxt.atlas.engineer

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dotfiles

    My dotfiles, used on archlinux, osx and debian (by BarbUk)

  • Chromium and Firefox have all your history stored in a sqlite database.

    I have a script to extract the last visited website from chrome for example: https://github.com/BarbUk/dotfiles/blob/master/bin/chrome_hi...

    For firefox, you can use something like:

    sqlite3 ~/.mozilla/firefox/.[dD]efault/places.sqlite "SELECT strftime('%d.%m.%Y %H:%M:%S', visit_date/1000000, 'unixepoch', 'localtime'),url FROM moz_places, moz_historyvisits WHERE moz_places.id = moz_historyvisits.place_id ORDER BY visit_date;"

  • ZAP

    The ZAP core project

  • That's the whole purpose behind ZAP and I use it for archiving pages all the time (they use hsqldb as the file format); it works fantastic for that purpose, but does -- as you correctly pointed out -- require MITM-ing the browser to trust their locally generated CA: https://github.com/zaproxy/zaproxy#readme

  • readability

    A standalone version of the readability lib

  • I've had a lot of success by running HTML pages through mozilla's readability[0] tool (actually the go port of it[1]) before indexing it.

    [0]: https://github.com/mozilla/readability

    [1]: https://github.com/go-shiori/go-readability

  • go-readability

    Go package that cleans a HTML page for better readability.

  • I've had a lot of success by running HTML pages through mozilla's readability[0] tool (actually the go port of it[1]) before indexing it.

    [0]: https://github.com/mozilla/readability

    [1]: https://github.com/go-shiori/go-readability

  • monolith

    ⬛️ CLI tool for saving complete web pages as a single HTML file

  • You can pipe the URLs through something like monolith[1].

    https://github.com/Y2Z/monolith

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts