Advantages of Monorepos

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • josh

    Just One Single History

  • I've not used it extensively, but Josh can probably help here.

    https://github.com/josh-project/josh

    It's designed for making multiple Git repos from a monorepo, but I think you should be able to make a skeleton repo that represents your desired final monorepo layout and push your individual repos to the Josh subviews of that repo to combine them all.

  • rules_python

    Bazel Python Rules

  • I have personally run converted build systems to Bazel, and use it for personal projects as well.

    Bazel 1.0 was released in October 2019. If you were using it "a few years ago", I'm guessing you were using a pre-1.0 version. There's not some cutoff where Bazel magically got easy to use, and I still wouldn't describe it as "easy", but the problem it solves is hard to solve well, and the community support for Bazel has gotten a lot better over the past years.

    https://github.com/bazelbuild/rules_python

    The difficulty and complexity of using Bazel is highly variable. I've seen some projects where using Bazel is just super simple and easy, and some projects where using Bazel required a massive effort (custom toolchains and the like).

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • TypeScript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • If you are using a dependency manager the repo abstraction, with multiple repos, does start to align as a useful node abstraction at the dependency graph level. For instance, if you are working in JS/TS, every repo has a top-level package.json file that is very easily consumed by tooling to discover dependencies. Github has a dependency graph that's pretty comprehensive for public packages as dependended on by public repos. For instance, the repos that depend on Typescript: https://github.com/microsoft/TypeScript/network/dependents?p...

    There's often less tooling available for private repository hosts and private package feeds, but dependency management from a per-repo standpoint is if not a solved problem in practice, an easily solvable problem. (Github has some tools for private repos if you pay for them. Other systems can borrow from the same playbooks.)

    (Other languages have similar dependency manifest files, most of which are similarly slurpable by easily automated tooling given the need/chance. Dependency discovery doesn't have to be a problem in multi-repository environments.)

    > test how a given upstream change affects every downstream package, or coordinate half a dozen PRs for every change

    Some of this is push versus pull questions. One developer needing to push a lot of changes is a lot of work for that one developer. Downstream "owners" needing to pull changes at a frequency is in some cases much tinier slices of work spread out/delegated over a larger team of people, many of whom may be closer to downstream projects to better fix and/or troubleshoot secondary impacts.

    Monorepos make push easier, definitely. Sometimes pull is a better workflow. (Especially if you are using the same dependency tooling for third-party components. These days given CVEs and such you want a regular update cadence on third-party components anyway, using the same tools for first-party updates keeps more reason to keep that cadence regular. Lots of small changes over time rather than big upgrade processes all at once.)

  • VFSForGit

    Virtual File System for Git: Enable Git at Enterprise Scale

  • At least in the case of git a surprising amount of the monorepo tooling is making it upstream into git itself. I'm aware of engineering efforts from both Microsoft and Twitter that are in today's git (a lot of the work on things like the git commit-graph and git sparse checkouts in particular are designed for monorepo tooling, though in some cases benefit smaller repos too).

    Microsoft's monorepo tooling has been especially interesting to watch from an engineering standpoint as seemingly almost all of it has been in the public eye, open source, and in most cases upstreamed. VFS for Git [1] was one of their first approaches (simply virtualizing the git filesystem and proxying it through servers as necessary), and while portions of it will never be upstreamed (in particular because it needs OS drivers) it's all open source, a lot of concepts from it were upstreamed into git itself and VFS for Git is mostly considered legacy/deprecated. Microsoft's more recent follow up tool was Scalar [2], which started as a fork of most of the remaining relevant bits of VFS for git plus a repo config tool that helped setup sparse clones while the git CLI ("porcelain") for sparse cloning took a bit to catch up with what the "plumbing" could do. Most of that got directly upstreamed into the git "porcelain" and since that point so much of Scalar was upstreamed into git that the remaining tools of Scalar are now VCed directly in Microsoft's git fork rather than its own repo.

    In terms of raw engineering capability it seems we are in something of a golden age of monorepo tools available as open source, for those trying to use git for monorepos. Admittedly the tools may be available now, but that doesn't make them any easier to work with than the era when they were simply unavailable because there's often a lot of engineering work still to be done to keep the tools humming along (in bandwidth and hosting alone).

    It's just interesting to see more of the tools available transparently, sometimes because they still have benefits to even smaller scaled repos. (While VFS for Git is unlikely necessary for small/medium repos, there are some times where sparse clones can be handy at even medium sizes. A lot of the engineering work upstreamed to make sparse clones performant and capable indirectly benefit repositories of any scale in reducing filesystem reads overall and adding support for storing better computed caches on-disk such as commit-graphs and reachability bitmaps rather than repetitively rebuilding them in memory.)

    [1] https://github.com/microsoft/vfsforgit

    [2] https://github.com/microsoft/scalar

  • qt5

    Qt5 super module

  • Have you ever tried contributing to Qt? I rather liked their use of submodules. https://github.com/qt/qt5

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts