ArchiveBox: Open-source self-hosted web archiving

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • ArchiveBox

    🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

  • Direct link: https://3xn.nl/projects/2022/02/17/archivebox-root-issue-in-...

    note you no longer need to create a user manually though, so this shouldn't be an issue anymore. just set ADMIN_USERNAME and ADMIN_PASSWORD env vars and it'll autocreate the user and collection on first run.

    https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#...

  • linkwallet

    A self-hosted bookmark database with full-text page content search

  • I looked at ArchiveBox and several similar projects a while ago, but realised I didn't want anything so complex. I just wanted bookmarks, with free-text content search so I could find something again based on more than just a title.

    So I wrote my own: https://github.com/tardisx/linkwallet

    Emphasis on tiny system requirements and dependancies (single binary, no service dependencies). As a consequence the text indexing is very basic (basic HTML scrape). But it's working for me :-)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • sonic

    🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

  • This is uncanny, I just discovered ArchiveBox earlier today and set up a self-hosted instance on some home hardware for a collection of bookmarks of useful guides, tutorials, and references I've collected over the years.

    Setting it up on K8s with sonic [1] as the search backend and importing a few hundred URLs only took ~an hour or so, and the cached pages look great for the most part.

    [1] https://github.com/valeriansaliou/sonic

  • DownloadNet

    💾 DownloadNet - All content you browse online available offline. Search through the full-text of all pages in your browser history. ⭐️ Star to support our work!

  • For anyone who uses Chrome and wants to view their archived pages in the browser as if they were still online (URL and everything intact), and also full-text search through their browsing history that was archived (like AB plans to add in future, I think, right nikki?) you can check out DownloadNet: https://github.com/dosyago/DownloadNet

    You can have multiple archives, and even use a mode where you only archive pages you bookmark rather than everything.

  • keeper

    Application to keep your personal info using custom formats described in YAML templates (by khromalabs)

  • Last year I've been working in a Golang open source tool with a more modest approach (just command line CLI) but with a similar goal (keep personal info), in my tool formats are described using simple YAML templates and stored in a sqlite db file https://github.com/khromalabs/keeper

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts