ArchiveBox: Open-source self-hosted web archiving

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

ArchiveBox

248 19,737 9.7 Python

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Direct link: https://3xn.nl/projects/2022/02/17/archivebox-root-issue-in-...
note you no longer need to create a user manually though, so this shouldn't be an issue anymore. just set ADMIN_USERNAME and ADMIN_PASSWORD env vars and it'll autocreate the user and collection on first run.
https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#...

linkwallet

4 75 2.6 Go

A self-hosted bookmark database with full-text page content search

I looked at ArchiveBox and several similar projects a while ago, but realised I didn't want anything so complex. I just wanted bookmarks, with free-text content search so I could find something again based on more than just a title.
So I wrote my own: https://github.com/tardisx/linkwallet
Emphasis on tiny system requirements and dependancies (single binary, no service dependencies). As a consequence the text indexing is very basic (basic HTML scrape). But it's working for me :-)

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
sonic

48 19,431 7.0 Rust

🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

This is uncanny, I just discovered ArchiveBox earlier today and set up a self-hosted instance on some home hardware for a collection of bookmarks of useful guides, tutorials, and references I've collected over the years.
Setting it up on K8s with sonic [1] as the search backend and importing a few hundred URLs only took ~an hour or so, and the cached pages look great for the most part.
[1] https://github.com/valeriansaliou/sonic

DownloadNet

20 3,643 6.4 JavaScript

💾 DownloadNet - All content you browse online available offline. Search through the full-text of all pages in your browser history. ⭐️ Star to support our work!

For anyone who uses Chrome and wants to view their archived pages in the browser as if they were still online (URL and everything intact), and also full-text search through their browsing history that was archived (like AB plans to add in future, I think, right nikki?) you can check out DownloadNet: https://github.com/dosyago/DownloadNet
You can have multiple archives, and even use a mode where you only archive pages you bookmark rather than everything.

keeper

2 7 5.9 Go

Application to keep your personal info using custom formats described in YAML templates (by khromalabs)

Last year I've been working in a Golang open source tool with a more modest approach (just command line CLI) but with a similar goal (keep personal info), in my tool formats are described using simple YAML templates and stored in a sqlite db file https://github.com/khromalabs/keeper

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

SeekStorm VS tantivy - a user suggested alternative
2 projects | 22 Mar 2024
YaCy, a distributed Web Search Engine, based on a peer-to-peer network
9 projects | news.ycombinator.com | 5 Mar 2024
Open Source Search Engine as an Alternative to Google Built in Spare Time
1 project | news.ycombinator.com | 9 Feb 2024
StractOrg/stract: web search done right
1 project | news.ycombinator.com | 8 Feb 2024
The Guy Building an Open-Source Google Search Competitor in His Spare Time
1 project | news.ycombinator.com | 8 Feb 2024

ArchiveBox: Open-source self-hosted web archiving

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Search search-engine self-hosted Archiving and Digital Preservation (DP) Rust
Post date: 11 Jan 2024

ArchiveBox

linkwallet

WorkOS

sonic

DownloadNet

keeper

InfluxDB

Related posts

ArchiveBox: Open-source self-hosted web archiving

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Search search-engine self-hosted Archiving and Digital Preservation (DP) Rust Post date: 11 Jan 2024

ArchiveBox

linkwallet

WorkOS

sonic

DownloadNet

keeper

InfluxDB

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Search search-engine self-hosted Archiving and Digital Preservation (DP) Rust
Post date: 11 Jan 2024