-
Here goes: https://github.com/dainiusjocas/lucene-grep/issues/84
I realize some relatively obscure Finnish stemmer and Lucene with GraalVM aren't exactly a common use case. I did some testing and provided my use case. I certainly have much English language content to search with using lucene-grep. So, thank you for making it!
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
Neat. This is similar to a tool I have been working on (but need to finish off) as I saw the same issue.
Except rather than build an index I brute forced the search each time. For most repositories it’s fast enough even with ranking.
https://github.com/boyter/cs For those interested it’s still very WIP with noticeable issues in TUI mode.
-
There is DXR from Mozilla but I'm not sure how generalised it is.
https://github.com/mozilla/dxr
There is also Sourcegraph.
-
ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Not OP so I can't speak for them. There's a bunch of ways to do this, ranging from more turnkey solutions to collections of scripts and extensions you can use. On the turnkey side, there's programs like ArchiveBox[1] which take links and store them as WARC files. You can import your browsing history into ArchiveBox and set up a script to do it automatically. If you'd like to set something up yourself, you can extract your browsing history (eg, firefox stores its history in a sqlite database) and manually wget those urls. For a reference to the more "bootstrapped" version, I'll link to Gwern's post on their archiving setup [2]. It's fairly long, so I advise skipping to the parts you're interested in first.
1: https://github.com/ArchiveBox/ArchiveBox
2: https://www.gwern.net/Archiving-URLs