Lmgrep: Lucene-based grep-like utility

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

lucene-grep

9 187 5.2 Clojure

Grep-like utility based on Lucene Monitor compiled with GraalVM native-image

Here goes: https://github.com/dainiusjocas/lucene-grep/issues/84
I realize some relatively obscure Finnish stemmer and Lucene with GraalVM aren't exactly a common use case. I did some testing and provided my use case. I certainly have much English language content to search with using lucene-grep. So, thank you for making it!

cs

9 502 7.5 Go

command line codespelunker or code search

Neat. This is similar to a tool I have been working on (but need to finish off) as I saw the same issue.
Except rather than build an index I brute forced the search each time. For most repositories it’s fast enough even with ranking.
https://github.com/boyter/cs For those interested it’s still very WIP with noticeable issues in TUI mode.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
dxr

2 798 0.1 Python

Discontinued DEPRECATED - Powerful search for large codebases

There is DXR from Mozilla but I'm not sure how generalised it is.
https://github.com/mozilla/dxr
There is also Sourcegraph.

ArchiveBox

248 19,790 9.8 Python

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Not OP so I can't speak for them. There's a bunch of ways to do this, ranging from more turnkey solutions to collections of scripts and extensions you can use. On the turnkey side, there's programs like ArchiveBox[1] which take links and store them as WARC files. You can import your browsing history into ArchiveBox and set up a script to do it automatically. If you'd like to set something up yourself, you can extract your browsing history (eg, firefox stores its history in a sqlite database) and manually wget those urls. For a reference to the more "bootstrapped" version, I'll link to Gwern's post on their archiving setup [2]. It's fairly long, so I advise skipping to the parts you're interested in first.
1: https://github.com/ArchiveBox/ArchiveBox
2: https://www.gwern.net/Archiving-URLs

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Code Search Is Hard

13 projects | news.ycombinator.com | 10 Apr 2024
OpenGrok: Fast and usable source code search and cross reference engine

3 projects | news.ycombinator.com | 10 Apr 2024
Vice website is shutting down

1 project | news.ycombinator.com | 23 Feb 2024
ArchiveBox – open-source self-hosted web archiving

2 projects | news.ycombinator.com | 13 Jan 2024
ArchiveBox: Open-source self-hosted web archiving

11 projects | news.ycombinator.com | 11 Jan 2024

Lmgrep: Lucene-based grep-like utility

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Archiving and Digital Preservation (DP) Search Grep Code Pocket
Post date: 24 Apr 2021

lucene-grep

cs

InfluxDB

dxr

ArchiveBox

Related posts

Code Search Is Hard

OpenGrok: Fast and usable source code search and cross reference engine

Vice website is shutting down

ArchiveBox – open-source self-hosted web archiving

ArchiveBox: Open-source self-hosted web archiving

Lmgrep: Lucene-based grep-like utility

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Archiving and Digital Preservation (DP) Search Grep Code Pocket Post date: 24 Apr 2021

lucene-grep

cs

InfluxDB

dxr

ArchiveBox

Related posts

Code Search Is Hard

OpenGrok: Fast and usable source code search and cross reference engine

Vice website is shutting down

ArchiveBox – open-source self-hosted web archiving

ArchiveBox: Open-source self-hosted web archiving

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Archiving and Digital Preservation (DP) Search Grep Code Pocket
Post date: 24 Apr 2021