Python Archiving

Open-source Python projects categorized as Archiving

Top 9 Python Archiving Projects

  • wal-e

    Continuous Archiving for Postgres

    Project mention: Run PostgreSQL. The Kubernetes Way | news.ycombinator.com | 2023-09-22

    See the GitHub: https://github.com/wal-e/wal-e

    Unmaintained would’ve made more sense to say, but the maintainer choose the words “obsolete” so I took those. :)

    Seems to be obsolete due to a lack of interest and contributions.

  • grab-site

    The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

    Project mention: Ask HN: How can I back up an old vBulletin forum without admin access? | news.ycombinator.com | 2024-01-29

    The format you want is WARC. Even the Library of Congress uses it. There are many many WARC scrapers. I'd look at what the Internet Archive recommends. A quick search turned up this from the Archive Team and Jason Scott https://github.com/ArchiveTeam/grab-site (https://wiki.archiveteam.org/index.php/Who_We_Are) but I found that in less than 15 seconds of searching so do your own diligence.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • URS

    Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

    Project mention: Nitter Shutting Down | news.ycombinator.com | 2024-01-27

    If they don't want you to use their API just respect their wishes and scrape Reddit. https://github.com/JosephLai241/URS it's the only moral thing we can do.

  • archiveis

    A simple Python wrapper for the archive.is capturing service

    Project mention: Stockman: The Destruction Of The American Middle Class | /r/economy | 2023-12-11

    To post Zerohedge articles you would have to use a "masking" sites like https://archive.is/ or some such thing to hide the Zerohedge link.

  • savepagenow

    A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service

  • Archivist-Tools

    These python scripts help you manage a large amount of files.

    Project mention: CoolRune - An easy way to setup Artix Linux automatically | /r/coolgithubprojects | 2023-06-22
  • wattpad-archiver

    Downloads your wattpad library to disk

  • WorkOS

    The modern API for authentication & user identity. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • archive-to-images

    Python CLI to transform archives into images and reverse.

    Project mention: Use Amazon Photos as personal cloud | /r/madeinpython | 2023-04-01
  • VideoCheck

    Automated tool to check consistency of your video library.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-01-29.

Python Archiving related posts

Index

What are some of the best open-source Archiving projects in Python? This list will help you:

Project Stars
1 wal-e 3,417
2 grab-site 1,174
3 URS 698
4 archiveis 170
5 savepagenow 160
6 Archivist-Tools 6
7 wattpad-archiver 6
8 archive-to-images 5
9 VideoCheck 1
ChatGPT with full context of any GitHub repo.
Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at app.getonboardai.com.
app.getonboardai.com