Python Archiving

Open-source Python projects categorized as Archiving

Top 10 Python Archiving Projects

  • paperless-ngx

    A community-supported supercharged version of paperless: scan, index and archive all your physical documents

  • Project mention: I accidentally built a meme search engine | news.ycombinator.com | 2024-04-13

    I steered a friend towards Paperless (and away from an LLM solution) as a way of searching/accessing GBs of architectural PDFs recently - so far, it’s apparently working well for them.

    https://github.com/paperless-ngx/paperless-ngx

  • wal-e

    Continuous Archiving for Postgres

  • Project mention: Run PostgreSQL. The Kubernetes Way | news.ycombinator.com | 2023-09-22

    See the GitHub: https://github.com/wal-e/wal-e

    Unmaintained would’ve made more sense to say, but the maintainer choose the words “obsolete” so I took those. :)

    Seems to be obsolete due to a lack of interest and contributions.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • grab-site

    The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

  • Project mention: Ask HN: How can I back up an old vBulletin forum without admin access? | news.ycombinator.com | 2024-01-29

    The format you want is WARC. Even the Library of Congress uses it. There are many many WARC scrapers. I'd look at what the Internet Archive recommends. A quick search turned up this from the Archive Team and Jason Scott https://github.com/ArchiveTeam/grab-site (https://wiki.archiveteam.org/index.php/Who_We_Are) but I found that in less than 15 seconds of searching so do your own diligence.

  • URS

    Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

  • Project mention: Nitter Shutting Down | news.ycombinator.com | 2024-01-27

    If they don't want you to use their API just respect their wishes and scrape Reddit. https://github.com/JosephLai241/URS it's the only moral thing we can do.

  • archiveis

    A simple Python wrapper for the archive.is capturing service

  • Project mention: Ask HN: Comments requesting paywall bypass links | news.ycombinator.com | 2024-04-18

    I frequently see comments from people explicitly or implicitly asking for links to bypass the paywall on submitted articles. I'm confused by this, since it takes about the same amount of effort to generate your own paywall bypassing link as it does to post a comment asking for someone else to do it. Going further and posting this link for others to use does add a step, but doesn't seem like a lot to ask.

    What's happening here?

    Do these posters think some special magic is required? Are they not aware that creating such a link just involves going to the top level domain of one of the services (eg, http://archive.is) and pasting the URL into a form?

    Are they opposed to the idea of creating such a link themselves, either due to moral qualms or legal fears, but willing a follow a link that some else has created?

    Are they using a handheld device that makes it so hard to copy a URL and open a new page that they don't know how to start, whereas they know how to write a comment?

    Or are they just so entitled that they think someone else should provide for them at all times, and don't want to demean themselves helping others?

    Can anyone who has posted such requests tell me what they were thinking? Can others who post bypass links tell me other explanations? General discussion on what the HN etiquette on paywall bypass links should be is welcomed as well.

  • savepagenow

    A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service

  • wattpad-archiver

    Downloads your wattpad library to disk

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Archivist-Tools

    These python scripts help you manage a large amount of files.

  • Project mention: CoolRune - An easy way to setup Artix Linux automatically | /r/coolgithubprojects | 2023-06-22
  • archive-to-images

    Python CLI to transform archives into images and reverse.

  • VideoCheck

    Automated tool to check consistency of your video library.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Archiving related posts

Index

What are some of the best open-source Archiving projects in Python? This list will help you:

Project Stars
1 paperless-ngx 16,754
2 wal-e 3,423
3 grab-site 1,260
4 URS 724
5 archiveis 170
6 savepagenow 164
7 wattpad-archiver 7
8 Archivist-Tools 6
9 archive-to-images 6
10 VideoCheck 2

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com