Match Thread: West Brom vs Liverpool | Premier League

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

archivenow

4 391 3.3 Python

A Tool To Push Web Resources Into Web Archives

#!/bin/bash function __longnow(){ # Use: Takes a txt file with one link on each line and pushes all the links to the internet archive # References: # https://unix.stackexchange.com/questions/181254/how-to-use-grep-and-cut-in-script-to-obtain-website-urls-from-an-html-file # https://github.com/oduwsdl/archivenow # For the double underscore, see: https://stackoverflow.com/questions/13797087/bash-why-double-underline-for-private-functions-why-for-bash-complet/15181999 input=$1 counter=1 while IFS= read -r line do wait if [ $(($counter % 15)) -eq 0 ] then printf "\nArchive.org doesn't accept more than 15 links per min; sleeping for 1min...\n" sleep 1m fi echo "Url: $line" archivenow --ia $line >& 1 ## alternatively, archivenow --all $line >& 1 if you want to use all archive services rather than just the internet archive counter=$((counter+1)) done < "$input" } echo 'Gaza' | sed 's/^.*: //' | sed 's/ /%20/g' | sed 's/^/https://news.google.com/rss/search?q=/' | xargs wget --quiet > /dev/null 2>&1 & wait ## This gets news about Gaza from the Google News API/XML endpoint echo "Gaza" | sed 's/^/search?q=/' | sed 's/^/"/;s/$/"/' | xargs xmllint --format 2>/dev/null | grep "title|pubDate|link" | sed 's/.*>(.*)<.*/\1/' | sed '0~3 a\' >> listofnews.txt ## This parses the xml and appends data about each article to a file called "list of news" echo "Gaza" | sed 's/^/search?q=/' | sed 's/^/"/;s/$/"/' | xargs xmllint --format 2>/dev/null | grep "link" | sed 's/.*>(.*)<.*/\1/' > tempforarchiver.txt ## This just gets the links and creates something to be fed to an archiver service. __longnow tempforarchiver.txt rm search?q=Gaza rm tempforarchiver.txt ## Add this to cron with something like ## $ crontab -e ## 30 22 * * * /the/location/of/this/file ### Without the "#" ## This might give you some grief if bash or the archivenow utility can't be found from within the cron instance.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Best way to feed Wayback Machine a list of URLs?

3 projects | /r/Archiveteam | 14 Dec 2021
Archiving the Gaza conflict

4 projects | /r/DataHoarder | 15 May 2021
How to easily save web pages to the Internet Archive's Wayback Machine

2 projects | /r/DataHoarder | 22 Apr 2021
Internet Archive: Open Library

1 project | news.ycombinator.com | 30 Apr 2024
Ask HN: Anyone looking for contributors for their open source projects

13 projects | news.ycombinator.com | 21 Mar 2024

Match Thread: West Brom vs Liverpool | Premier League

This page summarizes the projects mentioned and recommended in the original post on /r/test
web-archiving internet-archive
Post date: 16 May 2021

archivenow

InfluxDB

Related posts

Best way to feed Wayback Machine a list of URLs?

Archiving the Gaza conflict

How to easily save web pages to the Internet Archive's Wayback Machine

Internet Archive: Open Library

Ask HN: Anyone looking for contributors for their open source projects

Match Thread: West Brom vs Liverpool | Premier League

This page summarizes the projects mentioned and recommended in the original post on /r/test web-archiving internet-archive Post date: 16 May 2021

archivenow

InfluxDB

Related posts

Best way to feed Wayback Machine a list of URLs?

Archiving the Gaza conflict

How to easily save web pages to the Internet Archive's Wayback Machine

Internet Archive: Open Library

Ask HN: Anyone looking for contributors for their open source projects

This page summarizes the projects mentioned and recommended in the original post on /r/test
web-archiving internet-archive
Post date: 16 May 2021