Ask HN: Full-text browser history search forever?

Our great sponsors

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

Our great sponsors

ArchiveBox

248 19,737 9.7 Python

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

You can use Archivebox. Set it to grab URLs from your browser history database and it will archive them all to disk in whatever formats you want. You can then use whatever tools you want on those local files.
https://archivebox.io/

SingleFile

94 13,673 9.7 JavaScript

Web Extension for saving a faithful copy of a complete web page in a single HTML file

Not exactly what you're asking for but you can setup SingleFile[1] to automatically save each page you visit. Then there's also ArchiveBox[2] which can convert your browser history into various formats.
[1] https://github.com/gildas-lormeau/SingleFile

SurveyJS

surveyjs.io sponsored

Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
falcon

5 17 7.7 JavaScript

Firefox extension for full text history search! (by CennoxX)

In Chrome, you can install it from source:
https://github.com/CennoxX/falcon#transparent-installation
As for Firefox, you can right-click the "Add to Firefox" button and save and inspect the extension.

DownloadNet

20 3,643 6.4 JavaScript

💾 DownloadNet - All content you browse online available offline. Search through the full-text of all pages in your browser history. ⭐️ Star to support our work!

Hey. My project Diskernet does this: full text search over browser history.
Put it in "save" mode when using Chrome (linux is fine) and it automatically saves every page you browse (so you can read it offline), and also indexes it for full text search. It's a work in progress and there are bugs (so my advice initialize a git repo in your archive directory, and make regular syncs to a remote in case of failure -- that also gives you a nice snapshotted archive).
Anyway, best of luck to you! :)
Diskernet: https://github.com/crisdosyago/Diskernet

activitywatch

108 10,814 8.1 Python

The best free and open-source automated time tracker. Cross-platform, extensible, privacy-focused.

If it’s just the metadata that you want, you can use activity watcher [1]. They have a browser plug-in.
[1] https://activitywatch.net/

min

62 7,528 8.7 JavaScript

A fast, minimal browser that protects your privacy
nyxt

150 9,521 9.8 Common Lisp

Nyxt - the hacker's browser.

Nyxt browser is doing this pretty well! https://nyxt.atlas.engineer

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
dotfiles

4 29 8.4 Shell

My dotfiles, used on archlinux, osx and debian (by BarbUk)

Chromium and Firefox have all your history stored in a sqlite database.
I have a script to extract the last visited website from chrome for example: https://github.com/BarbUk/dotfiles/blob/master/bin/chrome_hi...
For firefox, you can use something like:
sqlite3 ~/.mozilla/firefox/.[dD]efault/places.sqlite "SELECT strftime('%d.%m.%Y %H:%M:%S', visit_date/1000000, 'unixepoch', 'localtime'),url FROM moz_places, moz_historyvisits WHERE moz_places.id = moz_historyvisits.place_id ORDER BY visit_date;"

ZAP

61 11,987 9.2 Java

The ZAP core project

That's the whole purpose behind ZAP and I use it for archiving pages all the time (they use hsqldb as the file format); it works fantastic for that purpose, but does -- as you correctly pointed out -- require MITM-ing the browser to trust their locally generated CA: https://github.com/zaproxy/zaproxy#readme

readability

51 8,056 6.3 JavaScript

A standalone version of the readability lib

I've had a lot of success by running HTML pages through mozilla's readability[0] tool (actually the go port of it[1]) before indexing it.
[0]: https://github.com/mozilla/readability
[1]: https://github.com/go-shiori/go-readability

go-readability

4 647 5.1 HTML

Go package that cleans a HTML page for better readability.

I've had a lot of success by running HTML pages through mozilla's readability[0] tool (actually the go port of it[1]) before indexing it.
[0]: https://github.com/mozilla/readability
[1]: https://github.com/go-shiori/go-readability

monolith

23 9,870 6.9 Rust

⬛️ CLI tool for saving complete web pages as a single HTML file

You can pipe the URLs through something like monolith[1].
https://github.com/Y2Z/monolith

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Ask HN: How do you save web articles for later reading?
4 projects | news.ycombinator.com | 29 Nov 2022
Lost something? Search through 91.7 million files from the 80s, 90s, and 2000s
3 projects | /r/DataHoarder | 19 Oct 2022
Is there a browser addon which locally archives every website I visit?
4 projects | /r/DataHoarder | 23 Dec 2021
Any way to archive the wiki/megathread all at once?
2 projects | /r/Piracy | 19 Dec 2021
Is there a way to make my bookmarks available offline to preserve them from future deletion?
2 projects | /r/DataHoarder | 11 Dec 2021

Ask HN: Full-text browser history search forever?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Browser Archive Productivity Archiver Firefox
Post date: 16 Mar 2022

ArchiveBox

SingleFile

SurveyJS

falcon

DownloadNet

activitywatch

min

nyxt

InfluxDB

dotfiles

ZAP

readability

go-readability

monolith

Related posts

Ask HN: Full-text browser history search forever?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Browser Archive Productivity Archiver Firefox Post date: 16 Mar 2022

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Browser Archive Productivity Archiver Firefox
Post date: 16 Mar 2022