-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
-
ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
You might also be interested in this list, those alternatives listed are really great and better, some support the WARC format (that my program doesn't).
I created Collect a few years ago and still use it today.
I use grab-site to crawl website and pack it into warc archive and then feed this archive into pywb
I landed on an opensource project called Archivebox. Its pretty amazing (basically like a locally hosted wayback machine and crawler. https://github.com/ArchiveBox/ArchiveBox It also captures different ways to ensure data integrity and can schedule! Thanks everyone for your input and apps for me to research!