Lmgrep: Lucene-based grep-like utility

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • lucene-grep

    Grep-like utility based on Lucene Monitor compiled with GraalVM native-image

  • Here goes: https://github.com/dainiusjocas/lucene-grep/issues/84

    I realize some relatively obscure Finnish stemmer and Lucene with GraalVM aren't exactly a common use case. I did some testing and provided my use case. I certainly have much English language content to search with using lucene-grep. So, thank you for making it!

  • cs

    command line codespelunker or code search

  • Neat. This is similar to a tool I have been working on (but need to finish off) as I saw the same issue.

    Except rather than build an index I brute forced the search each time. For most repositories it’s fast enough even with ranking.

    https://github.com/boyter/cs For those interested it’s still very WIP with noticeable issues in TUI mode.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dxr

    Discontinued DEPRECATED - Powerful search for large codebases

  • There is DXR from Mozilla but I'm not sure how generalised it is.

    https://github.com/mozilla/dxr

    There is also Sourcegraph.

  • ArchiveBox

    🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

  • Not OP so I can't speak for them. There's a bunch of ways to do this, ranging from more turnkey solutions to collections of scripts and extensions you can use. On the turnkey side, there's programs like ArchiveBox[1] which take links and store them as WARC files. You can import your browsing history into ArchiveBox and set up a script to do it automatically. If you'd like to set something up yourself, you can extract your browsing history (eg, firefox stores its history in a sqlite database) and manually wget those urls. For a reference to the more "bootstrapped" version, I'll link to Gwern's post on their archiving setup [2]. It's fairly long, so I advise skipping to the parts you're interested in first.

    1: https://github.com/ArchiveBox/ArchiveBox

    2: https://www.gwern.net/Archiving-URLs

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts