Improving large monorepo performance on GitHub

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • watchman

    Watches files and records, or triggers actions, when they change.

  • My understanding is that neither Git nor Mercurial can do this well out of the box, and FB and Google both have their own extensions to Mercurial to make this possible (because even though Mercurial is often slower than Git, it’s extensible)

    e.g. https://facebook.github.io/watchman/ - used as part of Facebook’s Mercurial solution, I think.

  • scalar

    Scalar: A set of tools and extensions for Git to allow very large monorepos to run on Git without a virtualization layer (by microsoft)

  • You might be interested in scalar [1] developed by Microsoft for handling large repos.

    [1]: https://github.com/microsoft/scalar

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • VFSForGit

    Virtual File System for Git: Enable Git at Enterprise Scale

  • When is GitHub going to finally add support for Microsoft’s VFSforGit?

    https://github.com/microsoft/VFSForGit

    https://vfsforgit.org/

  • go-git

    A highly extensible Git implementation in pure Go. (by go-git)

  • Have you ever tried it? It's not remotely performant and wouldn't make sense since GH is read heavy. Plus I'm sure they spend a lot of time thinking about this stuff, no?

    If you want to get your feet wet, check out go-git[1]. They have a storage layer that you quickly create alternative drivers for.

    [1] https://github.com/go-git/go-git/tree/master/storage

  • git

    GitGitGadget's Git fork. Open Pull Requests here to submit them to the Git mailing list (by gitgitgadget)

  • Git also has a file system monitor interface which can use Watchman. We (GitHub) are working on a native file system monitor implementation in addition - https://github.com/gitgitgadget/git/pull/900.

  • EdenSCM

    Discontinued A Scalable, User-Friendly Source Control System. [Moved to: https://github.com/facebook/sapling]

  • And then from mercurial extensions to our own server, mononoke, which apparently has been moved under the Eden umbrella: https://github.com/facebookexperimental/eden

  • Graal

    GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀

  • If people want a concrete example, here's a 6 GB repo that's 90% Java, 5% C, then some other languages.

    https://github.com/oracle/graal

    It's not even a mono-repo - this is just part of the project.

    Maybe someone's got some tools that let them dig around in the history and find large things or explain why it's so large? I don't think they've been checking ISOs in.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • juicefs

    JuiceFS is a distributed POSIX file system built on top of Redis and S3.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts