internetarchive
internetarchive-downloader
internetarchive | internetarchive-downloader | |
---|---|---|
17 | 7 | |
1,513 | 121 | |
- | - | |
8.3 | 3.6 | |
10 days ago | 4 months ago | |
Python | Python | |
GNU Affero General Public License v3.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
internetarchive
-
Official CLI Tool for the Internet Archive
https://github.com/jjjake/internetarchive/commit/952ace47e0e...
Me too, first commit was a bit more than 11 years ago.
-
What do you use to verify the hashes provided by Archive.org?
The --checksum switch of ia verifies the hashes.
- Mass downloading from Archive.org...how?
-
Using Python for Internet Archive Bulk Upload
first, i've tried python and internetarchive scripts only on XP/Vista with the corresponding version for those OS, without success. I moved to linux, instead. While I have a Raspberry Pi (RPi), I tried first on a Virtual Machine, under Windows. I chose Debian (that's what I run on the RPi) but also had a go at FreeBSD. Both have packages (binaries) ready to go and worked flawlessly. From your post, you have enough skills to set up a virtual machine and install a mainstream linux distro, which is basically downloading an iso, mounting it on the VM, clicking next,next,next,ok,done. You then would boot into the desktop and open the CLI (command line interface). Installing internet archive and python is just a matter of copy pasting a couple of commands. On linux, the internet archive package is https://packages.debian.org/stable/utils/internetarchive and I find it easier than grabbing the binaries through cURL, setting up permissions and whatnot. same for python3. it'll do it's thing (grabs all the files it needs, installs, cleans, all automated, and when it's done you're back at the prompt ($ <-- you asked what this operator means in Python but I think you mean when it shows on the documentation; it's just a command prompt, like it would be on windows cmd, for example c:\archives\uploads> waiting for a command) and ready to throw commands. you first need to setup with your credentials. just ia configure it'll ask all it needs and you're ready to upload stuff. mass uploading different items s basically entering the same command for as many times as it's needed. ia does this for you, using a CSV file -- this involves a bit of pre-processing but when set and done it'll save you a lot of time and wait.
-
I'm using 'screen' for some background tasks on a headless RPi server and it doesn't show progress info. Works fine outside it.
More specifically i'm using ia internetarchive, and Putty 0.75 to log into the Pi. All is updated and outside a screen session works fine. When transfering files I get a progress bar, %, speed and timestamps. But when on a screen all I get it the name of the file being uploaded and nothing else. It only changes when one file finishes and moves to the next or when all is uploaded. No other progress info.
-
Top Python Coding Repos
requests - A simple, yet elegant, HTTP library. sanic - Next generation Python web server/framework | Build fast. Run fast. click - Python composable command line interface toolkit elasticsearch-dsl-py - High level Python client for Elasticsearch panel - A high-level app and dashboarding solution for Python internetarchive - A Python and Command-Line Interface to Archive.org coconut - Simple, elegant, Pythonic functional programming
- It finally happened. Something I archived was erased from the Internet.
- Looking for some help in downloading a few thousand files from archive.org on ubuntu. wget is estimated to take 2 months... I figured I should ask the fellow data-hoarders!
-
How can I mirror big folder from Archive.org
You can do that with the Internet Archive's Python client by jjjake: https://github.com/jjjake/internetarchive
-
Wii WBFS games?
If you're comfortable with command line, you can use the internet archive python script to download stuff from archive.org ( https://github.com/jjjake/internetarchive )
internetarchive-downloader
- Does anyone know how to download the images from borrow-only Internet Archive books?
-
Is there a way to download all files in the URLs list for an archived site?
this tool works well for what you're asking for. https://github.com/john-corcoran/internetarchive-downloader
- Looking for some help in downloading a few thousand files from archive.org on ubuntu. wget is estimated to take 2 months... I figured I should ask the fellow data-hoarders!
-
How to view more than 25 results in an archive collection?
Another option to get all items in a collection that I used for a script I put together for Internet Archive downloads is the Internet Archive Python Library - official documentation on the relevant function is at https://archive.org/services/docs/api/internetarchive/quickstart.html#searching - and example of using it in code is around line 839 of https://github.com/john-corcoran/internetarchive-downloader/blob/61395ae4fbc826d9578678ed3299ada45d5ec3fd/ia_downloader.py
-
Pause Downloading of Collection From the Internet Archive?
Using the ‘-r’ flag with my Python script will allow resuming in-progress files, and if you run the script with the same command line arguments each time, you can pick up a collection where you left off - it’s at https://github.com/john-corcoran/internetarchive-downloader
-
Extracting all links from a webpage without html?
You may want to try this Python script I’ve finished recently for Internet Archive downloads: https://github.com/john-corcoran/internetarchive-downloader - collections should work fine if you pass it with the prefix ‘collection:’, e.g. ‘collection:nasa’ - if you want to give it a try, let me know if any questions!
-
What are the odds of the Internet Archive getting shut in the next 5 years and what will we do after it is shut?
I’ve made a Python script for this at https://github.com/john-corcoran/internetarchive-downloader which may assist?
What are some alternatives?
archiveOrgImageDownloader - A python script that will download pages from a borrowed book from the Internet Archive archive.org library and save them as images.
distributed-wikipedia-mirror - Putting Wikipedia Snapshots on IPFS
rfsh - RFSH: Run shell scripts in batch, concurrently, fully customized with variable .
archive-downloader - A downloader for archive.org
wrolpi - Create your own off-grid library
BaseCase-3 - This is a Python Application that can be used to gather all files of a certain type from any archive.com repository
WinPython - A free Python-distribution for Windows platform, including prebuilt packages for Scientific Python.
GGet - Multithreaded download accelerator written in Go
SCrawler - 🏳️🌈 Media downloader from any sites, including Twitter, Reddit, Instagram, Threads, Facebook, OnlyFans, YouTube, Pinterest, PornHub, XHamster, XVIDEOS, ThisVid etc.
pup - Parsing HTML at the command line
instaloader - Download pictures (or videos) along with their captions and other metadata from Instagram.
ipfs - Peer-to-peer hypermedia protocol