Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Archivenow Alternatives
Similar projects and alternatives to archivenow based on common topics and language
-
ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
wayback-machine-spn-scripts
Bash scripts which interact with Internet Archive Wayback Machine's Save Page Now
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
archivenow reviews and mentions
-
Best way to feed Wayback Machine a list of URLs?
I crawled a website I want to make sure is completely captured by Wayback Machine but now I need to figure out how to efficiently "feed" all the URLs into Wayback. I found archivenow but I'm terrible at Python so I'm not sure the best way to direct the program at the txt file and preferably create another txt/csv file listing the original url with the new archived url. Any help would be greatly appreciated!
-
Match Thread: West Brom vs Liverpool | Premier League
#!/bin/bash function __longnow(){ # Use: Takes a txt file with one link on each line and pushes all the links to the internet archive # References: # https://unix.stackexchange.com/questions/181254/how-to-use-grep-and-cut-in-script-to-obtain-website-urls-from-an-html-file # https://github.com/oduwsdl/archivenow # For the double underscore, see: https://stackoverflow.com/questions/13797087/bash-why-double-underline-for-private-functions-why-for-bash-complet/15181999 input=$1 counter=1 while IFS= read -r line do wait if [ $(($counter % 15)) -eq 0 ] then printf "\nArchive.org doesn't accept more than 15 links per min; sleeping for 1min...\n" sleep 1m fi echo "Url: $line" archivenow --ia $line >& 1 ## alternatively, archivenow --all $line >& 1 if you want to use all archive services rather than just the internet archive counter=$((counter+1)) done < "$input" } echo 'Gaza' | sed 's/^.*: //' | sed 's/ /%20/g' | sed 's/^/https://news.google.com/rss/search?q=/' | xargs wget --quiet > /dev/null 2>&1 & wait ## This gets news about Gaza from the Google News API/XML endpoint echo "Gaza" | sed 's/^/search?q=/' | sed 's/^/"/;s/$/"/' | xargs xmllint --format 2>/dev/null | grep "title|pubDate|link" | sed 's/.*>(.*)<.*/\1/' | sed '0~3 a\' >> listofnews.txt ## This parses the xml and appends data about each article to a file called "list of news" echo "Gaza" | sed 's/^/search?q=/' | sed 's/^/"/;s/$/"/' | xargs xmllint --format 2>/dev/null | grep "link" | sed 's/.*>(.*)<.*/\1/' > tempforarchiver.txt ## This just gets the links and creates something to be fed to an archiver service. __longnow tempforarchiver.txt rm search?q=Gaza rm tempforarchiver.txt ## Add this to cron with something like ## $ crontab -e ## 30 22 * * * /the/location/of/this/file ### Without the "#" ## This might give you some grief if bash or the archivenow utility can't be found from within the cron instance.
- Archiving the Gaza conflict
- How to easily save web pages to the Internet Archive's Wayback Machine
-
A note from our sponsor - InfluxDB
www.influxdata.com | 26 Apr 2024
Stats
oduwsdl/archivenow is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of archivenow is Python.
Popular Comparisons
Sponsored