Our great sponsors
|2 days ago||7 days ago|
|GNU General Public License v3.0 or later||MIT License|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Metadata Store - Which one to Choose ? OpenMetadata vs Datahub ?
5 projects | reddit.com/r/dataengineering | 17 Nov 2022
We use Kubernetes as our deployment platform. Any feedback on one of these open source data catalogs ? - https://atlas.apache.org/#/ - https://opendatadiscovery.org/ - https://open-metadata.org/ - https://marquezproject.github.io/marquez/ - https://datahubproject.io/ - https://www.amundsen.io/ - https://ckan.org/ - https://magda.io/
How to start Data Science and Machine Learning Career?
2 projects | reddit.com/r/ReviewNPrep | 24 Nov 2021
We are digitisers at the Natural History Museum in London, on a mission to digitise 80 million specimens and free their data to the world. Ask us anything!
4 projects | reddit.com/r/datasets | 8 Mar 2021
We publish all our data on the [Data Portal](https://data.nhm.ac.uk), a Museum project that's been running since 2014. Instead of MediaWiki it runs on an open-source Python framework called [CKAN](https://ckan.org), which is designed for hosting datasets - though we've had to adapt it in various ways so that it can handle such large amounts of data.
If we lose the Internet Archive, we’re screwed
2 projects | reddit.com/r/opensource | 14 May 2023
I wish there was an alternative to the Internet Archive with collaborative curation. You share files and people who tag and sort them into albums can download them. And if it was federated it could be just as extensive as the Internet Archive by searching files on many instances at the same time. Sadly the closest thing are ArchiveBox and wayback which won't replace the Internet Archive.
End-of-Availability notice for legacy DSM, Surveillance Station, SRM, and more
2 projects | reddit.com/r/synology | 10 May 2023
Useful browser extensions and their associated selfhosted services.
4 projects | reddit.com/r/selfhosted | 8 May 2023
Basically the ArchiveBox bookmarklet? It has a collection method for git. Seems they also have an extension, https://github.com/tjhorner/archivebox-exporter
Looking for recommendations (Bookmarks/Links)
8 projects | reddit.com/r/selfhosted | 23 Apr 2023
Any options available to organize and save (may be) reddit saved posts?
4 projects | reddit.com/r/selfhosted | 20 Apr 2023
Any bookmarking software/app/extension rcm?
6 projects | reddit.com/r/software | 19 Apr 2023
A self-hosted archiving service integrated with Internet Archive, archive.today, IPFS and beyond.
2 projects | reddit.com/r/DataHoarder | 15 Apr 2023
there's also https://github.com/ArchiveBox/ArchiveBox
Wayback: Self-hosted archiving service integrated with Internet Archive
7 projects | news.ycombinator.com | 15 Apr 2023
ArchiveBox also saves a Readability version:
> Article Text: article.html/json Article text extraction using Readability & Mercury 7 projects | news.ycombinator.com | 15 Apr 2023
I love all these kind of projects as I tend to be paranoid of losing good online content.
It’s also unclear to me how wWayback works. It seems more like an API than a self-hosted service.
I’m currently using ArchiveBox , which provides a complete API + UI.
-  https://archivebox.io/
Can't run "docker-compose run archivebox init --setup"
2 projects | reddit.com/r/docker | 2 Apr 2023
Now I wish to installe Archivebox https://github.com/ArchiveBox/ArchiveBox on the same Pi with the following instructions:
What are some alternatives?
Wallabag - wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.
paimon-moe - Your best Genshin Impact companion! Help you plan what to farm with ascension calculator and database. Also track your progress with todo and wish counter.
SingleFile - Web Extension and CLI tool for saving a faithful copy of an entire web page in a single HTML file
ArchivesSpace - The ArchivesSpace archives management tool
Archivematica - Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.
logseq - A local-first, non-linear, outliner notebook for organizing and sharing your personal knowledge base. Use it to organize your todo list, to write your journals, or to record your unique life.
grab-site - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
knowledge - Everything I know
Shiori - Simple bookmark manager built with Go
Access to Memory (AtoM) - Open-source, web application for archival description and public access.