Google Is 2B Lines of Code–and It's All in One Place

This page summarizes the projects mentioned and recommended in the original post on

Our great sponsors
  • SonarLint - Clean code begins in your IDE with SonarLint
  • Scout APM - Less time debugging, more time building
  • SaaSHub - Software Alternatives and Reviews
  • Bazel

    a fast, scalable, multi-language and extensible build system

    (Opinions are my own)

    > Do they do an immense amount of code generation?

    Blaze (aka Bazel [0]) has provisions that make it easy to generate code but this happens as a compile step rather than something that is checked into a git repo.

    [0] -

  • githut

    Github Language Statistics

    Reading the sibling comments, I realized that GitHub and SG have built tooling to index text, while Google have built tooling to index C++.

    cs.chromium et al experiences instantaneous dropoff from "wow this is kind of amazing" the moment you start wading through JavaScript or Mojo glue code.

    A quick Google for "most popular languages on github" just found, which (if it's correct) reveals that GitHub's most popular language (by commit activity) is Python (17%), very closely followed by JavaScript (14%). Then you have Java (12%), TypeScript (8%), Go (8%), C++ (6%), Ruby (6%) and PHP (5%). To reiterate, C++ represents 6% of GitHub PR activity, while the top two languages (Python/JS) that represent 31% of PR activity are not only interpreted but also dynamically typed.

    Which tells a very interesting story about the benefits of being able to generate an AST with type information: you can scale insight that much further. :(

  • SonarLint

    Clean code begins in your IDE with SonarLint. Up your coding game and discover issues early. SonarLint is a free plugin that helps you find & fix bugs and security issues from the moment you start writing code. Install from your favorite IDE marketplace today.

  • lsif-clang

    Language Server Indexing Format (LSIF) generator for C, C++ and Objective C

    - Go:

    Why are not all repos covered?

    Because different languages have different build systems, so inferring the right build commands, dependencies etc. is not so straightforward; these are necessary per-requisites for compiler-accurate cross references. We're working on fixing this with auto-indexing:

    For C and C++ specifically, auto-indexing is challenging because of the large variety in build systems, informal specification of dependencies (such as in a README instead of a machine-readable format), and platform-specific code.

    Outside of auto-indexing, we do have an indexer for C and C++ right now ( which can be run in CI; that way one can generate an index and upload it to Sourcegraph on a regular basis. It is 'Partially available' ( right now. We're keenly aware of the interest in C++, and are working our way through different languages based on usage.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts