archivenow
tubeup
archivenow | tubeup | |
---|---|---|
4 | 15 | |
391 | 384 | |
1.0% | 2.6% | |
3.3 | 6.4 | |
3 months ago | 17 days ago | |
Python | Python | |
MIT License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
archivenow
-
Best way to feed Wayback Machine a list of URLs?
I crawled a website I want to make sure is completely captured by Wayback Machine but now I need to figure out how to efficiently "feed" all the URLs into Wayback. I found archivenow but I'm terrible at Python so I'm not sure the best way to direct the program at the txt file and preferably create another txt/csv file listing the original url with the new archived url. Any help would be greatly appreciated!
-
Match Thread: West Brom vs Liverpool | Premier League
#!/bin/bash function __longnow(){ # Use: Takes a txt file with one link on each line and pushes all the links to the internet archive # References: # https://unix.stackexchange.com/questions/181254/how-to-use-grep-and-cut-in-script-to-obtain-website-urls-from-an-html-file # https://github.com/oduwsdl/archivenow # For the double underscore, see: https://stackoverflow.com/questions/13797087/bash-why-double-underline-for-private-functions-why-for-bash-complet/15181999 input=$1 counter=1 while IFS= read -r line do wait if [ $(($counter % 15)) -eq 0 ] then printf "\nArchive.org doesn't accept more than 15 links per min; sleeping for 1min...\n" sleep 1m fi echo "Url: $line" archivenow --ia $line >& 1 ## alternatively, archivenow --all $line >& 1 if you want to use all archive services rather than just the internet archive counter=$((counter+1)) done < "$input" } echo 'Gaza' | sed 's/^.*: //' | sed 's/ /%20/g' | sed 's/^/https://news.google.com/rss/search?q=/' | xargs wget --quiet > /dev/null 2>&1 & wait ## This gets news about Gaza from the Google News API/XML endpoint echo "Gaza" | sed 's/^/search?q=/' | sed 's/^/"/;s/$/"/' | xargs xmllint --format 2>/dev/null | grep "title|pubDate|link" | sed 's/.*>(.*)<.*/\1/' | sed '0~3 a\' >> listofnews.txt ## This parses the xml and appends data about each article to a file called "list of news" echo "Gaza" | sed 's/^/search?q=/' | sed 's/^/"/;s/$/"/' | xargs xmllint --format 2>/dev/null | grep "link" | sed 's/.*>(.*)<.*/\1/' > tempforarchiver.txt ## This just gets the links and creates something to be fed to an archiver service. __longnow tempforarchiver.txt rm search?q=Gaza rm tempforarchiver.txt ## Add this to cron with something like ## $ crontab -e ## 30 22 * * * /the/location/of/this/file ### Without the "#" ## This might give you some grief if bash or the archivenow utility can't be found from within the cron instance.
- Archiving the Gaza conflict
- How to easily save web pages to the Internet Archive's Wayback Machine
tubeup
- How can I archvie a youtube video?
-
Advanced and easy YouTube archiver now stable
If you want to provide an option to upload artifacts to the Internet Archive, you could crib off of https://github.com/bibanon/tubeup
- Swedish amateur archivist “Rosa Mannen” shutdown from YouTube, again
- 52% of YouTube videos live in 2010 have been deleted
-
Notice regarding Termination of Our Contract with “Uruha Rushia”
ArchiveTeam recommends the use of TubeUp.py, which acts as a wrapper for yt-dl to use for comprehensive archival, in addition to automatically uploading the content to the Internet Archive.
-
Downloading YouTube Private Videos
What I do is use a tool called tubeup. It will download a YouTube link you give it and automatically upload it to archive.org. It can take single videos or even whole playlists. That way, if it gets removed from YouTube, I know where to find it again, and I don't have to manage local copies or use up own cloud storage.
-
Am I actually wasting time & effort by archiving a still existing Youtube channel?
This can automate a LOT of the manual work: https://github.com/bibanon/tubeup
-
Is There A Youtube Liked Videos Saver?
You could pass a playlist of your liked videos to https://github.com/bibanon/tubeup
- Backing up a YouTube channel with rare live recordings
-
How to archive YouTube videos on the Wayback machine
I have tried the tubeup project but I spent most of the day trying to make that work but to no avail (I am working on macos Big Sur). I have also seen this post from what seems to be one of the staff, but it seems like a very long and complicated process to take just to archive YouTube videos.
What are some alternatives?
wayback-machine-spn-scripts - Bash scripts which interact with Internet Archive Wayback Machine's Save Page Now
yt-dlp - A feature-rich command-line audio/video downloader
videoduplicatefinder - Video Duplicate Finder - Crossplatform
heritrix3 - Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
FireDM - python open source (Internet Download Manager) with multi-connections, high speed engine, based on python, LibCurl, and youtube_dl https://github.com/firedm/FireDM
docker-tubeup - Docker container for Tubeup
upvotocracy-ui-ssr - Free speech Reddit clone
youtube-dlc - Command-line program to download various media from YouTube.com and other sites
UCALC - UCALC (Ultimate Calculator) is an advanced Python-based calculator that was my first major Python project.
yark - YouTube archiving made simple.
squid-dl - a massively parallel yt-dlp-based YouTube downloader
yt-dlc - media downloader and library for various sites.