|3 days ago||6 days ago|
|MIT License||MIT License|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Looking for open source software to scrape webpages but also make them searchable with a webui. (locally hosted)
4 projects | reddit.com/r/DataHoarder | 16 Jan 2021
I created Collect a few years ago and still use it today.
Did Mozilla Ever Open Source Pocket?
5 projects | reddit.com/r/fossdroid | 3 Feb 2023
I also found https://floccus.org/ and https://archivebox.io/ on Alternativeto, for self-hosters.
Best way to back up entire website on a schedule
2 projects | reddit.com/r/DataHoarder | 29 Jan 2023
You could also look into something like archivebox.io, but it doesn't really mirror so great. fetchurls can make an URL list though which could in turn be fed into archivebox. Archivebox would maybe be handy if you wanted the wget download along with a PDF print + maybe sending to Wayback Machine.
Alternative to Wallabag with better web clipper
5 projects | reddit.com/r/selfhosted | 15 Jan 2023
Best (simple) tool for personal Wiki
4 projects | reddit.com/r/DataHoarder | 10 Jan 2023
https://archivebox.io/ Is what I use for that.
Offline Internet Archive
5 projects | news.ycombinator.com | 7 Jan 2023
Self-hosted web scraper?
4 projects | reddit.com/r/selfhosted | 3 Jan 2023
You didn't say what features are important or what about changedetection.io didn't work for you, but maybe ArchiveBox or Huginn
Wiki/Offline Website Options
2 projects | reddit.com/r/selfhosted | 3 Jan 2023
Something like ArchiveBox is what you want?
Reasons for why data hoarding is important and why you should start
4 projects | reddit.com/r/DataHoarder | 1 Jan 2023
One option is Archivebox – built with Python, available via Docker, actively developed.
Most used selfhosted services in 2022?
103 projects | reddit.com/r/selfhosted | 27 Dec 2022
ArchiveBox: Wayback Machine but selfhosted
I've started down the path of Data Hoarding 😇 Downloaded my data from services, backing up photos and vids from my phone, and archiving my YouTube playlists!
5 projects | reddit.com/r/DataHoarder | 15 Dec 2022
Servers I'm currently running, Photoprism, TubeArchivist, Gramps-web, paperless-ng, Archivebox, Stash, Jellyfin/Plex.
What are some alternatives?
paimon-moe - Your best Genshin Impact companion! Help you plan what to farm with ascension calculator and database. Also track your progress with todo and wish counter.
Wallabag - wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.
SingleFile - Web Extension and CLI tool for saving a faithful copy of an entire web page in a single HTML file
ArchivesSpace - The ArchivesSpace archives management tool
logseq - A local-first, non-linear, outliner notebook for organizing and sharing your personal knowledge base. Use it to organize your todo list, to write your journals, or to record your unique life.
Archivematica - Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.
CKAN - CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
grab-site - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
knowledge - Everything I know
Access to Memory (AtoM) - Open-source, web application for archival description and public access.
awesome-selfhosted - A list of Free Software network services and web applications which can be hosted on your own servers
Shiori - Simple bookmark manager built with Go