ArchiveBox
promnesia
Our great sponsors
ArchiveBox | promnesia | |
---|---|---|
248 | 33 | |
19,672 | 1,686 | |
2.8% | - | |
9.7 | 7.8 | |
5 days ago | 17 days ago | |
Python | Python | |
MIT | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ArchiveBox
-
Ask HN: What Underrated Open Source Project Deserves More Recognition?
Two projects I greatly appreciate, allowing me to easily archive my bandcamp and GOG purchases (after the initial setup anyways):
https://github.com/easlice/bandcamp-downloader
https://github.com/Kalanyr/gogrepoc
And I recently learned about archivebox, which I think is going to be a fast favorite and finally let me clear out my mess of tabs/bookmarks: https://github.com/ArchiveBox/ArchiveBox
- YaCy, a distributed Web Search Engine, based on a peer-to-peer network
-
An Introduction to the WARC File
API is coming soon (relatively, it's still a one-man project)! Stay tuned https://github.com/ArchiveBox/ArchiveBox/issues/496
I have an event-sourcing refactor in progress now to allow us to pluginize functionality like the API (similar to Home Assistant with a plugin app sotre), it will take a month or two. Next up is the REST API using the new plugin system.
The ArchiveBox project (which gets reposted on the regular: e.g. https://news.ycombinator.com/item?id=38954189 ) also saves in WARC https://github.com/ArchiveBox/ArchiveBox#output-formats although I've personally not used it to comment further
-
Ask HN: How can I back up an old vBulletin forum without admin access?
I guess your best chance is to use something like https://archivebox.io/.
-
ArchiveBox โ open-source self-hosted web archiving
Yeah this is a cool project but it was discussed 2 days ago.
As mentioned by the maintainer there, they even maintain a list of alternatives, very classy:
https://github.com/ArchiveBox/ArchiveBox/wiki/Web-Archiving-...
-
ArchiveBox: Open-source self-hosted web archiving
Actually closer to 7 years ago :)
You can learn about the origin story / motivation here:
https://github.com/ArchiveBox/ArchiveBox#background--motivat...
https://2020.pycon.co/en/talks/5/ (a conference talk I gave about it)
Direct link: https://3xn.nl/projects/2022/02/17/archivebox-root-issue-in-...
note you no longer need to create a user manually though, so this shouldn't be an issue anymore. just set ADMIN_USERNAME and ADMIN_PASSWORD env vars and it'll autocreate the user and collection on first run.
https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#...
I may add an opt-in federation option at some point in the far future, it would be great to figure out a way to link willing donor's ArchiveBox instances together for public benefit.
Follow here for progress: https://github.com/ArchiveBox/ArchiveBox/issues/50
promnesia
-
Mozilla "MemoryCache" Local AI
In term of automatically saving everything, There is heyday.xyz, polished but quite expensive. Or https://github.com/karlicoss/promnesia, a more experimental take.
-
Update 4: RedReader granted non-commercial accessibility exemption
Promnesia & theconversation.social were on similar themes/solutions.
-
Ask HN: How do you save and browse external interesting URLs?
1. you often don't know what resources you will really "value" in the future, so no more to save or not to save, this is the question
2. tagging, to be effective, require discipline (thinking about then sticking to an agile system). So, we just replace it with search, preferably NLP/AI (so you don't have to remember the exact keywords)
Apps do exist, from the expansive [1] to the experimental [2].
Personally I invested time in my filling system, and over-saving does not cause me much angst, so Iโm OK with it. I also use maintenance as an occasion for renewed discovery.
- Ask HN: Search what you've seen on the web before
- The coolest Python projects you've ever seen?
- Ask HN: Does anybody still use bookmarking services?
-
Git.io: GitHub will maintain active links in a read-only state
It's kind of tricky to do in general case, e.g. even hackernews is keeping meaningful semantic information in id= query parameter.
Because of that it ultimately needs to a site-specific database/algorithm, perhaps with a fallback to the default behaviour like simply cleaning up the most common garbage like (_encoding/usg/etc). I suspect it's possible to use some sort of machine learning to guess the meaningful parts of the URL path/query/fragments, but even for that we need some human curation for the training set. I wish we could collaborate on a shared database/library for that, have sketched some ideas/applications/prior art here: https://beepb00p.xyz/exobrain/projects/cannon.html
I started thinking about it since I have a similar problem in Promnesia (https://github.com/karlicoss/promnesia#readme), a knowledge management tool I'm working on. Ideally I want to normalise URLS, so they address the exact bit of information, and nothing more.
- Discover-It-Later App, and Why itโs Superior to Read-It-Later
-
How do you curate your knowledge while browsing the web?
My observation in the last few years here seem to indicate that - not many Emacs users are necessarily into Org mode and this kind of data curation, or atleast that few have very elaborate setups that they have shared. Here's some serious inspiration: https://beepb00p.xyz/myinfra.html, and AFAIK the most comprehensive example out there. For example, there's the Promnesia package by the same author (https://github.com/karlicoss/promnesia) which I used for awhile and its cool ! There's also Karl Voit's Memacs https://github.com/novoid/Memacs/ (which appears mentioned in the previous link).
-
Gains I'm Seeing from My Second Brain Tool
This is my approach!
I'm using HPI [0] as a sort of universal API for almost all of my data (manual notes, bookmarks, instant messages, internet comments, etc)
Then I use it in tools like Orger [1] and Promnesia [2] which function as my second brain
[0] https://github.com/karlicoss/HPI
What are some alternatives?
Wallabag - wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.
paimon-moe - Your best Genshin Impact companion! Help you plan what to farm with ascension calculator and database. Also track your progress with todo and wish counter.
SingleFile - Web Extension for saving a faithful copy of a complete web page in a single HTML file
ArchivesSpace - The ArchivesSpace archives management tool
grab-site - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Archivematica - Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.
knowledge - Everything I know
logseq - A local-first, non-linear, outliner notebook for organizing and sharing your personal knowledge base. Use it to organize your todo list, to write your journals, or to record your unique life.
CKAN - CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
Access to Memory (AtoM) - Open-source, web application for archival description and public access.
Shiori - Simple bookmark manager built with Go
LinkAce - LinkAce is a self-hosted archive to collect links of your favorite websites.