Our great sponsors
-
sccache
Sccache is a ccache-like tool. It is used as a compiler wrapper and avoids compilation when possible. Sccache has the capability to utilize caching in remote storage environments, including various cloud storage options, or alternatively, in local storage.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Worth noting that the first commit in sccache git repository was in 2014 (https://github.com/mozilla/sccache/commit/115016e0a83b290dc2...). So I suppose that what "happened" happened waay back.
In the case of the Remote Execution/Cache API used by Bazel among others[1] at least, it's a bit more detailed. There's an "ActionCache" and an actual content-addressed cache that just stores blobs ("ContentAddressableStorage"). When you run a `gcc -O2 foo.c -o foo.o` command (locally or remotely; doesn't matter), you upload an "Action" into the action cache, which basically said "This command was run. As a result it had this stderr, stdout, error code, and these input files read and output files written." The input and output files are then referenced by the hash of their contents, in this case.
Most importantly you can look up an action in the ActionCache without actually running it. So now when another person comes by and runs the same build command, they say "Has this Action, with these inputs, been run before?" and the server can say "Yes, and the output is a file identified by hash XYZ" where XYZ is the hash of foo.o
So realistically you always some mix of "input content hashing" and "output content hashing" (the second being the definition of 'content addressable'.)
[1] https://github.com/bazelbuild/remote-apis/blob/main/build/ba...
sccache should not be allowed in the sandbox; realistically such cases are probably better handled in the long run by Recursive Nix, so that you can build new derivations inside of existing ones, and the results are cached in the same (outer) store. This means there won't be duplicate caches; /nix/store and where-ever sccache puts results.
Cargo is a good example. For a number of (practical) reasons, cargo -> nix translators are a lot of effort and often have bugs, so for "simplicity" all upstream crates just compile every crate dependency every time. That means if two Nix expressions use the same depedency, it gets compiled twice. It's important to understand this is no worse than the way Cargo works already for most people. Cargo does not have a content address storage model, unfortunately. But it's pretty annoying for Nix users and costs a lot.
In theory, we could wrap rustc with a recursive-nix enabled wrapper so that rlib's etc each get a granular derivation and then get put into the host store. So assuming a crate gets built with the same flags, between two "outer" expressions, they'll get to share the work and it will go into the store. A working example of this for C++ code is nix-ccache, but a fully robust implementation is a bit of work. [1]
Recursive Nix is still experimental but there is some use of it (privately and publicly) that I know of for these purposes.
[1] https://github.com/edolstra/nix-ccache