Our great sponsors
-
ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
sonic
🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.
-
DownloadNet
💾 DownloadNet - All content you browse online available offline. Search through the full-text of all pages in your browser history. ⭐️ Star to support our work!
-
keeper
Application to keep your personal info using custom formats described in YAML templates (by khromalabs)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Direct link: https://3xn.nl/projects/2022/02/17/archivebox-root-issue-in-...
note you no longer need to create a user manually though, so this shouldn't be an issue anymore. just set ADMIN_USERNAME and ADMIN_PASSWORD env vars and it'll autocreate the user and collection on first run.
https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#...
I looked at ArchiveBox and several similar projects a while ago, but realised I didn't want anything so complex. I just wanted bookmarks, with free-text content search so I could find something again based on more than just a title.
So I wrote my own: https://github.com/tardisx/linkwallet
Emphasis on tiny system requirements and dependancies (single binary, no service dependencies). As a consequence the text indexing is very basic (basic HTML scrape). But it's working for me :-)
This is uncanny, I just discovered ArchiveBox earlier today and set up a self-hosted instance on some home hardware for a collection of bookmarks of useful guides, tutorials, and references I've collected over the years.
Setting it up on K8s with sonic [1] as the search backend and importing a few hundred URLs only took ~an hour or so, and the cached pages look great for the most part.
[1] https://github.com/valeriansaliou/sonic
For anyone who uses Chrome and wants to view their archived pages in the browser as if they were still online (URL and everything intact), and also full-text search through their browsing history that was archived (like AB plans to add in future, I think, right nikki?) you can check out DownloadNet: https://github.com/dosyago/DownloadNet
You can have multiple archives, and even use a mode where you only archive pages you bookmark rather than everything.
Last year I've been working in a Golang open source tool with a more modest approach (just command line CLI) but with a similar goal (keep personal info), in my tool formats are described using simple YAML templates and stored in a sqlite db file https://github.com/khromalabs/keeper
Related posts
-
SeekStorm VS tantivy - a user suggested alternative
2 projects | 22 Mar 2024
- YaCy, a distributed Web Search Engine, based on a peer-to-peer network
- Open Source Search Engine as an Alternative to Google Built in Spare Time
- StractOrg/stract: web search done right
- The Guy Building an Open-Source Google Search Competitor in His Spare Time