lucene-grep
ArchiveBox
lucene-grep | ArchiveBox | |
---|---|---|
9 | 255 | |
189 | 20,990 | |
- | 1.6% | |
0.5 | 9.8 | |
about 1 month ago | 7 days ago | |
Clojure | Python | |
Apache License 2.0 | MIT |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lucene-grep
- FLaNK Stack Weekly for 20 June 2023
-
Using Java's Project Loom to build more reliable distributed systems
- Graal native images are real. These boast a far lower startup overhead and much lower steady state memory usage for simpler applications.
Probably my counterexample of choice is this: https://github.com/dainiusjocas/lucene-grep - it uses Lucene, probably the best search library (core of Elasticsearch, Solr, most websites), which is notoriously not simple code to implement grep-like functionality. In simple cases, they demonstrate a 30ms whole process runtime with no more than 32MB of RAM used (which looks suspiciously like a default).
The JVM is fast becoming a bit like Postgres... one of those 'second best at everything' pieces of tech.
- lucene-grep - grep-like utility based on Lucene Monitor compiled with GraalVM native-image
-
Lmgrep: Lucene-based grep-like utility
Here goes: https://github.com/dainiusjocas/lucene-grep/issues/84
I realize some relatively obscure Finnish stemmer and Lucene with GraalVM aren't exactly a common use case. I did some testing and provided my use case. I certainly have much English language content to search with using lucene-grep. So, thank you for making it!
- Lmgrep
ArchiveBox
- Web Archiving Projects
- I have 2000 old VHS tapes in my garage and I don't know what to do with them
- To preserve their work journalists take archiving into their own hands
-
Why are so many books listed as "Borrow Unavailable" at the Internet Archive?
And one nice tool for scraping archives for yourself is https://archivebox.io/
a nice frontend by https://news.ycombinator.com/user?id=nikisweeting
-
Ask HN: What do you use for reading papers?
I use https://archivebox.io/ and point it at the uri.
I tend to not able to find the page even 15 minutes ago if I'm on a dense information search.
- The Internet Archive is under a DDoS attack
-
38% of webpages that existed in 2013 are no longer accessible a decade later
There's also https://archivebox.io which can take your bookmarks and archive them in many ways. Unfortunately back when I tried it last time it was a big buggy, I wish there was a better solution to build a nice archive of the sites I visit more often just in case.
-
Ask HN: What Underrated Open Source Project Deserves More Recognition?
Two projects I greatly appreciate, allowing me to easily archive my bandcamp and GOG purchases (after the initial setup anyways):
https://github.com/easlice/bandcamp-downloader
https://github.com/Kalanyr/gogrepoc
And I recently learned about archivebox, which I think is going to be a fast favorite and finally let me clear out my mess of tabs/bookmarks: https://github.com/ArchiveBox/ArchiveBox
- YaCy, a distributed Web Search Engine, based on a peer-to-peer network
-
Vice website is shutting down
If you really want to save the content for yourself, use something like https://archivebox.io/
I've been running a local instance for a few years now and download/save tech articles all time. I can search and find them as needed.
What are some alternatives?
beagle - A smart, reliable, and highly customizable debug menu library for Android apps that supports screen recording, network activity logging, and many other useful features.
Wallabag - wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.
ali-dbhub - 已迁移新仓库,此版本将不再维护
paimon-moe - Your best Genshin Impact companion! Help you plan what to farm with ascension calculator and database. Also track your progress with todo and wish counter.
cs - command line codespelunker or code search
SingleFile - Web Extension for saving a faithful copy of a complete web page in a single HTML file
coyote - Coyote is a library and tool for testing concurrent C# code and deterministically reproducing bugs.
ArchivesSpace - ArchivesSpace, the archives management tool
BlockHound - Java agent to detect blocking calls from non-blocking threads.
Archivematica - Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.
loom - https://openjdk.org/projects/loom
grab-site - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns