Facebook open sources Glean: a scalable code search and query engine

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Glean

19 897 9.8 Hack

System for collecting, deriving and working with facts about source code.

Kythe has one schema, whereas with Glean each language has its own schema with arbitrary amounts of language-specific detail. You can get a language-agnostic view by defining an abstraction layer as a schema. Our current (work in progress) language-agnostic layer is called "codemarkup" https://github.com/facebookincubator/Glean/blob/main/glean/s...
For wiring up the indexer, there are various methods, it tends to depend very much on the language and the build system. For Flow for example, Glean output is just built into the typechecker, you just run it with some flags to spit out the Glean data. For C++, you need to get the compiler flags from the build system to pass to the Clang frontend. For Java the indexer is a compiler plugin; for Python it's built on libCST. Some indexers send their data directly to a Glean server, others generate files of JSON that get sent using a separate command-line tool.
References use different methods depending on the language. For Flow for example there is a fact for an import that matches up with a fact for the export in the other file. For C++ there are facts that connect declarations with definitions, and references with declarations.

linguist

40 11,804 8.7 Ruby

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!

GitHub's linguist library can be used to identify the programming language of a single file: https://github.com/github/linguist#single-file

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
glean

4 354 9.4 Rust

Modern cross-platform telemetry (by mozilla)
livegrep

10 1,892 5.4 C++

Interactively grep source code. Source for http://livegrep.com/

If you've not had to deal with a codebase that takes VSCode longer than a few minutes to index, then you're probably outside their initial target market. If you've not had to setup a hosted code search tool (eg livegrep https://github.com/livegrep/livegrep ) because there's just too much code,

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project