Code Search Is Hard

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • val-town-mirror

    Search all public vals (updated hourly)

  • Hey! I'm a val.town fanboy and I immediately thought about a workaround while reading the blog post:

    What if I dumped every publics vals in Github, in order to be able to user their (awesome) search ?

    So here is my own "Val Town Search": https://val-town-search.pomdtr.me

    And here is the repo containing all vals, updated hourly thanks to a github action: https://github.com/pomdtr/val-town-mirror

  • gitlab

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • livegrep

    Interactively grep source code. Source for http://livegrep.com/

  • If you ever leave you can use Livegrep, which was based on code-search work done at Google. I personally don't use it right now but it's great and will probably meet all your needs.

    [0] https://github.com/livegrep/livegrep

  • dcs

    Debian Code Search (codesearch.debian.net) is a search engine that searches through all the 130 GB of open source software that is included in Debian. Supports regular expressions!

  • ElasticsearchCodeSearch

    Index and Search GitHub Repositories using Elasticsearch

  • rum

    RUM access method - inverted index with additional information in posting lists (by postgrespro)

  • the rum index has worked well for us on roughly 1TB of pdfs. written by postgrespro, same folks who wrote core text search and json indexing. not sure why rum not in core. we have no problems.

       https://github.com/postgrespro/rum

  • ripgrep

    ripgrep recursively searches directories for a regex pattern while respecting your gitignore

  • Basic code searching skills seems like something new developers are never explicitly taught, but which is an absolutely crucial skill to build early on.

    I guess the knowledge progression I would recommend would look something kind this:

    - Learning about Ctrl+F, which works basically everywhere.

    - Transitioning to ripgrep https://github.com/BurntSushi/ripgrep - I wouldn't even call this optional, it's truly an incredible and very discoverable tool. Requires keeping a terminal open, but that's a good thing for a newbie!

    - Optional, but highly recommended: Learning one of the powerhouse command line editors. Teenage me recommended Emacs; current me recommends vanilla vim, purely because some flavor of it is installed almost everywhere. This is so that you can grep around and edit in the same window.

    - In the same vein, moving back from ripgrep and learning about good old fashioned grep, with a few flags rg uses by default: `grep -r` for recursive search, `grep -ri` for case insensitive recursive search, and `grep -ril` for case insensitive recursive "just show me which files this string is found in" search. Some others too, season to taste.

    - Finally hitting the wall with what ripgrep can do for you and switching to an actual indexed, dedicated code search tool.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • bloop

    bloop is a fast code search engine written in Rust.

  • https://github.com/BloopAI/bloop Is fully open source and has full text + regex search built on tantivy fyi

  • paradedb

    Postgres for Search and Analytics

  • Elasticsearch is good, and it does scale, but it is much more cumbersome and expensive to scale and operate than Postgres. If you use the managed service, you'll pay for the operational pain in the form of higher pricing.

    The Postgres movement is strong and extensions like ParadeDB https://github.com/paradedb/paradedb are designed specifically to solve this pain point (Disclaimer: I work for ParadeDB)

  • septum

    Context-based code search tool

  • https://github.com/pyjarrett/septum

    The hardest part about getting code search right imo is grabbing the right amount of surrounding context, which septum is aimed at solving on a per-file basis.

    Another one I'm surprised hasn't been mentioned is stack-graphs (https://github.com/github/stack-graphs), which tries to incrementally resolve symbolic relationships across the whole codebase. It powers github's cross-file precise indexing and conceptually makes a lot of sense, though I've struggled to get the open source version to work

  • stack-graphs

    Rust implementation of stack graphs

  • https://github.com/pyjarrett/septum

    The hardest part about getting code search right imo is grabbing the right amount of surrounding context, which septum is aimed at solving on a per-file basis.

    Another one I'm surprised hasn't been mentioned is stack-graphs (https://github.com/github/stack-graphs), which tries to incrementally resolve symbolic relationships across the whole codebase. It powers github's cross-file precise indexing and conceptually makes a lot of sense, though I've struggled to get the open source version to work

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts