webarchiving

Open-source projects categorized as webarchiving

Top 4 webarchiving Open-Source Projects

  • awesome-web-archiving

    An Awesome List for getting started with web archiving

  • Project mention: Show HN: OpenAPI DevTools – Chrome ext. that generates an API spec as you browse | news.ycombinator.com | 2023-10-25

    https://github.com/iipc/awesome-web-archiving/blob/main/READ...

  • waybackpy

    Wayback Machine API interface & a command-line tool

  • Project mention: download all captures of a page in archive.org | /r/Archiveteam | 2023-06-05

    I ended up using waybackpy python module to retrieve archived URLs, it worked well. I think the feature you want for this is the "snapshots", but I didn't test this myself

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • wget-lua

    Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

  • cc-notebooks

    Various Jupyter notebooks about Common Crawl data

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

webarchiving related posts

  • DPReview.com is going down effective April 10.

    2 projects | /r/DataHoarder | 22 Mar 2023
  • DPReview.com to close on April 10 after 25 years of operation

    2 projects | /r/DataHoarder | 21 Mar 2023
  • This Layoff Does Not Exist: tech layoff announcements but weird

    1 project | /r/programming | 13 Feb 2023
  • Alternative to HTTrack (website copier) as of 2023?

    4 projects | /r/DataHoarder | 10 Feb 2023
  • Software to keep Website pages "alive"?

    2 projects | /r/software | 17 Nov 2022
  • How to Download All of Wikipedia onto a USB Flash Drive

    7 projects | news.ycombinator.com | 6 Oct 2022
  • [HELP] Starting Out for a Beginner

    1 project | /r/Archiveteam | 9 Aug 2021
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 2 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source webarchiving projects? This list will help you:

Project Stars
1 awesome-web-archiving 1,811
2 waybackpy 405
3 wget-lua 81
4 cc-notebooks 37

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com