grab-site
Joplin
Our great sponsors
grab-site | Joplin | |
---|---|---|
30 | 771 | |
1,260 | 42,770 | |
3.5% | - | |
3.8 | 9.9 | |
about 1 month ago | 5 days ago | |
Python | TypeScript | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
grab-site
-
Ask HN: How can I back up an old vBulletin forum without admin access?
The format you want is WARC. Even the Library of Congress uses it. There are many many WARC scrapers. I'd look at what the Internet Archive recommends. A quick search turned up this from the Archive Team and Jason Scott https://github.com/ArchiveTeam/grab-site (https://wiki.archiveteam.org/index.php/Who_We_Are) but I found that in less than 15 seconds of searching so do your own diligence.
-
struggling to download websites
You can use grab-site with --no-offsite-links and --igsets=mediawiki.
- Internet Archive Down, will be up and running soon (i hope).
-
best tool for downloading forum posts in real-time?
Does the forum provide real-time notification for new posts? Like maybe a RSS feed, or a 'New' section? If so, some scripting around grab-site or httrack could grab them quickly.
-
How are you archiving websites you visit?
After a lot of searching for a similar topic, this is a tool I found which works pretty well: https://github.com/ArchiveTeam/grab-site
-
Help building or mirroring docs.microsoft.com
Crawling is of course the other option. I've seen https://github.com/ArchiveTeam/grab-site in the wiki, but I'm unsure how to host the resulting .warc archives.
- grab-site: The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
- Data hoarders, start backing up government websites and news articles as well
-
How to mirror multiple websites correctly?
It's a completely different tool, but I like using grab-site https://github.com/archiveteam/grab-site . Try --wpull-args=--span-hosts='' or something to make it mirror all subdomains. It outputs in WARC format which can be read with a site like https://replayweb.page.
-
Stack Overflow Developer Story Data Dump (10 whole MB !)
Thusly, as a bit of a statement, here's your "I will do it myself even if I have to bash my head against the wall" collection of the Developer Story on 10-20 top users. I know there are some blogs on old web design, perhaps it might be worth their while as a memento of an era bygone. And as for myself, I am looking into setting up a dedicated server for either grab-site or ArchiveBox. Possibly both!
Joplin
- Ask HN: What is your approach for managing personal digital assets?
- Joplin is an open source note-taking app
-
My productivity app is a never-ending .txt file
I've had great success with using Joplin for this, with Syncthing as a sync backend. Works well across OSes; I use it on Linux, macOS, Windows and Android.
https://joplinapp.org/
-
Why I Like Obsidian
The tools to manipulate SQL aren't that bad, no.
But rather than having a self explanatory markdown & flat file, now I have to start learning about the schema & making specific tools (in my preferred language) for manipulating Joplin's schema.
Suddenly I'm digging through 20 different technic specs to decode what data is where, how it works, and what I can do to it. Want to edit history? This is the best help you'll get, pray it's adequately technical to expedite you to your purpose: https://github.com/laurent22/joplin/blob/dev/readme/dev/spec...
As I began with, I struggle to imagine anything that generates anywhere near as much user agency as flat files and markdown. Having boring common data & systems lets me apply portable skills I already have, rather than having to skill up in some particular product's own ecosystem.
-
IAC sold 17 apps to Bending Spoons. $100M deal, all 330 employees fired
Joplin is a good open source option too, feels more like the original Evernote in terms of UI/UX https://github.com/laurent22/joplin/
-
Ask HN: What do you use for note-taking or as knowledge base?
Joplin, an open source, extendable, Markdown-based hierarchical note-taking app: https://joplinapp.org/
It lets you choose a synchronization backend, offers applications for every major desktop and mobile OS (also has a terminal version). You can create notebooks and subnotebooks to organize your notes. You can also add tags for better search experience. I created notebooks for specific domains (work-related, home improvement, etc.) and also keep a "temp" for quick notes and W.I.P. snippets.
Its only con that it uses Electron on desktop which causes relatively slow start of the application.
-
Joplin VS Einwurf - a user suggested alternative
2 projects | 20 Dec 2023
- PSA to Evernote Free users: 2 similar FREE apps to migrate to (I hope this post can end these questions so we can leave this sub's users in peace!)
- Evernote alternatives?
-
Evernote Pre Mortem
done
What are some alternatives?
ArchiveBox - 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Trilium Notes - Build your personal knowledge base with Trilium Notes
browsertrix-crawler - Run a high-fidelity browser-based crawler in a single Docker container
obsidian - GraphQL, built for Deno - a native GraphQL caching client and server module
docker-swag - Nginx webserver and reverse proxy with php support and a built-in Certbot (Let's Encrypt) client. It also contains fail2ban for intrusion prevention.
notesnook - A fully open source & end-to-end encrypted note taking alternative to Evernote.
awesome-datahoarding - List of data-hoarding related tools
Boostnote - This repository is outdated and new Boost Note app is available! We've launched a new Boost Note app which supports real-time collaborative writing. https://github.com/BoostIO/BoostNote-App
wpull - Wget-compatible web downloader and crawler.
logseq - A local-first, non-linear, outliner notebook for organizing and sharing your personal knowledge base. Use it to organize your todo list, to write your journals, or to record your unique life.
replayweb.page - Serverless replay of web archives directly in the browser
QOwnNotes - QOwnNotes is a plain-text file notepad and todo-list manager with Markdown support and Nextcloud / ownCloud integration.