Alternative to HTTrack (website copier) as of 2023?

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • wget-lua

    Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

  • You're using it wrong, rtfm, wget is still the standard. It's also extensible beyond the base feature set, take for example wget-lua ArchiveTeams well maintained go to for near all scraping projects by the group.

  • os

    Discontinued Tiny Linux distro that runs the entire OS as Docker containers

  • I wonder if that's a job for rancherOS since everything in rancherOS is a docker container, https://rancher.com/docs/os/v1.x/en/ . Or is there some better compact OS?

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ArchiveBox

    πŸ—ƒ Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

  • Archivebox is a no-go for my needs because I often want to crawl entire domains, and as far as I can tell, they don’t support that: https://github.com/ArchiveBox/ArchiveBox/issues/191

  • browsertrix-crawler

    Run a high-fidelity browser-based crawler in a single Docker container

  • I have started using the tools from https://webrecorder.net like Browsertrix Crawler and they have been working great. The web archive format is open source and very portable. The crawler even crawls and saves YouTube videos embedded on pages which is awesome.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Vice website is shutting down

    1 project | news.ycombinator.com | 23 Feb 2024
  • ArchiveBox – open-source self-hosted web archiving

    2 projects | news.ycombinator.com | 13 Jan 2024
  • Best practices for archiving websites

    2 projects | /r/datacurator | 6 Dec 2023
  • BetaWiki – An open encyclopedia of software history

    1 project | news.ycombinator.com | 20 Jun 2023
  • How to Read and Organize Online Articles (Without Driving Yourself Crazy)

    3 projects | news.ycombinator.com | 18 Jun 2023