lsif-clang
imdb-rename
lsif-clang | imdb-rename | |
---|---|---|
4 | 6 | |
33 | 221 | |
- | - | |
0.0 | 6.2 | |
about 1 year ago | 2 months ago | |
C++ | Rust | |
- | The Unlicense |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lsif-clang
-
The technology behind GitHub’s new code search
In the top right corner of the tooltip it will say either "Search-based" or "Precise" - in this case, you're right, we don't have the abseil-cpp repo indexed so it falls back to search-based as you describe.
We do have a C++ code indexer in beta, https://github.com/sourcegraph/lsif-clang - it is based on clang but C++ indexing is notably harder to do automatically/without-setup due to the varying build systems that need to be understood in order to invoke the compiler.
-
GitHub Code Search (Preview)
Interesting because on https://lsif.dev/ I see that LSIF support for C++, which basically is just a wrapper around clangd AFAIU, is deprecated. Is there something else that replaced it?
-
SCIP - a better code indexing format than LSIF
We already have an LSIF indexer for C++ (lsif-clang); however, that is not as feature complete as the other indexers. Moreover, the codebase is forked off of Clang 10, so upgrading to newer Clang versions (and build a SCIP indexer on top of that) will be a challenge.
-
Google Is 2B Lines of Code–and It's All in One Place
- Go:
Why are not all repos covered?
Because different languages have different build systems, so inferring the right build commands, dependencies etc. is not so straightforward; these are necessary per-requisites for compiler-accurate cross references. We're working on fixing this with auto-indexing: https://docs.sourcegraph.com/code_intelligence/explanations/...
For C and C++ specifically, auto-indexing is challenging because of the large variety in build systems, informal specification of dependencies (such as in a README instead of a machine-readable format), and platform-specific code.
Outside of auto-indexing, we do have an indexer for C and C++ right now (https://github.com/sourcegraph/lsif-clang) which can be run in CI; that way one can generate an index and upload it to Sourcegraph on a regular basis. It is 'Partially available' (https://docs.sourcegraph.com/code_intelligence/references/in...) right now. We're keenly aware of the interest in C++, and are working our way through different languages based on usage.
imdb-rename
- IMDB-rename: A command line tool to rename media files based on titles from IMDB
-
my rarbg magnet backup (268k)
I wrote a tool that did something related a while back using IMDb data: https://github.com/BurntSushi/imdb-rename
-
Projects in rust
This might be of interest: https://github.com/BurntSushi/imdb-rename
-
The technology behind GitHub’s new code search
What a shit take. The article itself is perhaps a nice light overview of 101-ish level concepts, although knowing how and when to apply them in a real engineering context is not something I would consider 101 level. And certainly, building something that is actually at the scale of GitHub Search is nowhere near 101 level.
This is what a 101-level inverted index implementation looks like: https://github.com/BurntSushi/imdb-rename
In other words, absolutely nothing like what GitHub built. Nowhere close.
-
How to use mmap safely in Rust?
imdb-rename is an example of a tool that memory maps FSTs on disk in order to execute fulltext searches very quickly on the command line.
-
But How, Do Databases Use Mmap?
> How else would you lazy-load a database of (say) 32GB into memory, almost instantly?
That's what the fst crate[1] does. It's likely working at a lower level of abstraction than you intend. But the point is that it works, is portable and doesn't require any cooperation from the OS other than the ability to memory map files. My imdb-rename tool[2] uses this technique to build an on-disk database for instantaneous searching. And then there is the regex-automata crate[3] that permits deserializing a regex instantaneously from any kind of slice of bytes.[4]
I think you should maybe provide some examples of what you're suggesting to make it more concrete.
[1] - https://crates.io/crates/fst
[2] - https://github.com/BurntSushi/imdb-rename
[3] - https://crates.io/crates/regex-automata
[4] - https://docs.rs/regex-automata/0.1.9/regex_automata/#example...
What are some alternatives?
cppinsights - C++ Insights - See your source code with the eyes of a compiler
httpdirfs - A filesystem which allows you to mount HTTP directory listings or a single file, with a permanent cache. Now with Airsonic / Subsonic support!
codechecker - CodeChecker is an analyzer tooling, defect database and viewer extension for the Clang Static Analyzer and Clang Tidy
direct-io - Direct IO helpers for block devices and regular files on FreeBSD, Linux, macOS and Windows.
scip - SCIP Code Intelligence Protocol
wg-allocators - Home of the Allocators working group: Paving a path for a standard set of allocator traits to be used in collections!
color_coded - A vim plugin for libclang-based highlighting of C, C++, ObjC
stack-graphs - Rust implementation of stack graphs
LLVM-Guide - LLVM (Low Level Virtual Machine) Guide. Learn all about the compiler infrastructure, which is designed for compile-time, link-time, run-time, and "idle-time" optimization of programs. Originally implemented for C/C++ , though, has a variety of front-ends, including Java, Python, etc.
textscanner
advanced
Bazel - a fast, scalable, multi-language and extensible build system