From a Single Repo, to Multi-Repos, to Monorepo, to Multi-Monorepo

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • Composer

    Dependency Manager for PHP

  • Could you elaborate more? The OP had to split the monorepo due to https://getcomposer.org/ limitations (each package needs it's own repo it seems). What other issues have you found when working with a monorepo? Thanks!

  • Jenkins

    Jenkins automation server

  • > Does anyone have any other practices they can recommend for managing these type of projects?

    Honestly, the only way around these sorts of issues is to utilize automation in some form.

    I've found that setting up repositories (like devpi[0], Artifactory[1], or Docker Registry[2]) on a shared network location and using CI/CD tools (like Jenkins[3]) are the key. The goal is that you end up working on one portion of the code base at a time, and those need to go through the standard validation processes so that you can pull in the updated package version when you work on something down-stream. You making sure that the CI/CD environment _doesn't_ have access to other packages's non-versioned code is key for making sure things actually work as expected.

    For example, if you have FooLib, and you need an update in that for BarApp, then even if you branch FooLib 1.2.3 to 1.2.3-1-gabc1234d (the `git describe` of the commit) on `feat/new-thingy` , then even if BarApp v2.3.4-1-gaf901234 depends on that new branch, it shouldn't be in any way able to reference that branch on the build process. How do you get around this? Good development -- finish the FooLib branch, get that working, merge it in with the updated version, and push the package (with the new version) to the CI/CD-accessible repository. At that point, when you push your BarApp change, it can actually build and not die. But until FooLib has got a versioned update, BarApp's branch _shouldn't_ be able to build.

    The statement of "But I want to work on the changes locally, in parallel" is valid. That's what local development is for -- giving you space to work on related things that don't impact the upstream codebase. You should have the option to utilize FooLib's branch code in your BarApp code locally, and you can often do that via things like `pip install` or `maven install` or whatever the relevant local install command is. At this point, the package still probably has the same version number, so the local build doesn't trigger issues. You can work on the two and tweak and twist as you want, but refrain from actually trying to push BarApp referencing FooLib's branch until it's actually in the repo.

    This all takes a great deal of restraint and patience. The goal here is make it just a tad harder to introduce problems somewhere since you can't depend on something that hasn't been given the go-ahead. While there might be a lot of "Updated FooLib requirement to v1.2.4" throughout your codebase, why are you doing that just off-hand? If you are doing it because of a security issue or bug, let that be known in the commit message. If you are doing it because you can utilize a new feature/whatever, your commit message won't be just "Updated FooLib", you likely are doing "Added Feature X2Y, updated FooLib to 1.2.4".

    PHP I try not to touch much, simply because I've always had bad experiences. I know for a fact that there are decent ways to do it with build tools like Maven[4], setuptools[5], and Docker[6]. Hell, I have used Docker as a way to introduce versioned dependency packaging, only needing to use Docker Registry (each dependent project does a multi-stage build, pulling in the dependencies via the versioned package images).

    ---

    [0]: https://devpi.net/docs/devpi/devpi/latest/%2Bd/index.html

    [1]: https://jfrog.com/artifactory/

    [2]: https://docs.docker.com/registry/

    [3]: https://www.jenkins.io/

    [4]: https://maven.apache.org/

    [5]: https://setuptools.readthedocs.io/en/latest/

    [6]: https://www.docker.com/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Mithril.js

    A JavaScript Framework for Building Brilliant Applications

  • Nice write-up. I have explored different repo strategies quite a bit myself in the course of a few efforts that I've been involved with. On one, we originally had a monolythic framework and everything the article said about cons is pretty spot on. However, I'll qualify by saying that I think the problems come less because of the nature of monolyths in general and more because of lack of experience with modular design.

    We then wrote a new framework using a monorepo approach, with separate packages using Lerna. The problem here was tooling. Dependent builds were not supported and I've had to delete node_modules more times than I'd ever cared to count. The article talks about some github specific problems (namely, the issues list being a hodge-podge of every disparate package). We tried zenhub, it works ok, but it's a hack and it kinda shows. I've seen other projects organize things via tags. Ultimately it comes down to what the team is willing to put up with.

    We eventually broke the monorepo out into multi-repos, and while that solved the problem of managing issues, now the problem was that publishing packages + cross-package dependencies meant that development was slower (especially with code reviews, blocking CI tests, etc).

    Back to a monorepo using Rush.js (and later Bazel). Rush had similar limitations as Lerna (in particular, no support for dependent tests) and we ditched it soon afterwards. Bazel has a lot of features, but it takes some investment to get the most out of it. I wrote a tool to wrap over it[0] and setup things to meet our requirements.

    We tried the "multi-monorepo" approach at one point (really, this is just git submodules), and didn't get very good results. The commands that you need to run are draconian and having to remember to sync things manually all the time is prone to errors. What's worse is that since you're dealing with physically separate repos, you're back to not having good ways to do atomic integration tests across package boundaries. To be fair, I've seen projects use the submodules approach[1] and it could work depending on how stable your APIs are, but for corporate requirements, where things are always in flux, it didn't work out well.

    Which brings me to another effort I was involved with more recently: moving all our multi-repo services into a monorepo. The main rationale here is somewhat related to another reason submodules don't really fly: there's a ton of packages being, a lot of stakeholders with various degrees of commit frequency, and reconciling security updates with version drift is a b*tch.

    For this effort we also invested into using Bazel. One of the strengths of this tool is how you can specify dependent tasks, for example "if I touch X file, only run the tests that are relevant". This is a big deal, because at 600+ packages, a full CI run consumes dozens of hours worth of compute time. The problem with monorepos comes largely from the sheer scale: bumping something to the next major version requires codemods, and there's always someone doing some crazy thing you never anticipated.

    With that said, monorepos are not a panacea. A project from a sibling team is a components library and it uses a single repo approach. This means a single version to manage for the entire set of components. You may object that things are getting bumped even when they don't need to, but it turns out this is actually very well received by consumers, because it's far easier to upgrade than having to figure out the changelog of dozens of separate packages.

    I used a single repo monolyth-but-actually-modular setup for my OSS project[2] and that has worked well for me, for similar reasons: people appreciate curation, and since we want to avoid willy-nilly breaking changes, a single all-emcompassing version scheme encourages development to work towards stability rather than features-for-features-sake.

    My takeaway is that multi-repos cause a lot of headaches both for framework authorship and for service development, that single repos can be a great poor-mans choice for framework authors, and monorepos - with the appropriate amount of investment in tooling - have good multiplicative potential for complex project clusters. YMMV.

    [0] https://github.com/uber-web/jazelle

    [1] https://github.com/sebbekarlsson/fjb/tree/master/external

    [2] https://mithril.js.org/

  • jazelle

    Incremental, cacheable builds for large Javascript monorepos using Bazel

  • Nice write-up. I have explored different repo strategies quite a bit myself in the course of a few efforts that I've been involved with. On one, we originally had a monolythic framework and everything the article said about cons is pretty spot on. However, I'll qualify by saying that I think the problems come less because of the nature of monolyths in general and more because of lack of experience with modular design.

    We then wrote a new framework using a monorepo approach, with separate packages using Lerna. The problem here was tooling. Dependent builds were not supported and I've had to delete node_modules more times than I'd ever cared to count. The article talks about some github specific problems (namely, the issues list being a hodge-podge of every disparate package). We tried zenhub, it works ok, but it's a hack and it kinda shows. I've seen other projects organize things via tags. Ultimately it comes down to what the team is willing to put up with.

    We eventually broke the monorepo out into multi-repos, and while that solved the problem of managing issues, now the problem was that publishing packages + cross-package dependencies meant that development was slower (especially with code reviews, blocking CI tests, etc).

    Back to a monorepo using Rush.js (and later Bazel). Rush had similar limitations as Lerna (in particular, no support for dependent tests) and we ditched it soon afterwards. Bazel has a lot of features, but it takes some investment to get the most out of it. I wrote a tool to wrap over it[0] and setup things to meet our requirements.

    We tried the "multi-monorepo" approach at one point (really, this is just git submodules), and didn't get very good results. The commands that you need to run are draconian and having to remember to sync things manually all the time is prone to errors. What's worse is that since you're dealing with physically separate repos, you're back to not having good ways to do atomic integration tests across package boundaries. To be fair, I've seen projects use the submodules approach[1] and it could work depending on how stable your APIs are, but for corporate requirements, where things are always in flux, it didn't work out well.

    Which brings me to another effort I was involved with more recently: moving all our multi-repo services into a monorepo. The main rationale here is somewhat related to another reason submodules don't really fly: there's a ton of packages being, a lot of stakeholders with various degrees of commit frequency, and reconciling security updates with version drift is a b*tch.

    For this effort we also invested into using Bazel. One of the strengths of this tool is how you can specify dependent tasks, for example "if I touch X file, only run the tests that are relevant". This is a big deal, because at 600+ packages, a full CI run consumes dozens of hours worth of compute time. The problem with monorepos comes largely from the sheer scale: bumping something to the next major version requires codemods, and there's always someone doing some crazy thing you never anticipated.

    With that said, monorepos are not a panacea. A project from a sibling team is a components library and it uses a single repo approach. This means a single version to manage for the entire set of components. You may object that things are getting bumped even when they don't need to, but it turns out this is actually very well received by consumers, because it's far easier to upgrade than having to figure out the changelog of dozens of separate packages.

    I used a single repo monolyth-but-actually-modular setup for my OSS project[2] and that has worked well for me, for similar reasons: people appreciate curation, and since we want to avoid willy-nilly breaking changes, a single all-emcompassing version scheme encourages development to work towards stability rather than features-for-features-sake.

    My takeaway is that multi-repos cause a lot of headaches both for framework authorship and for service development, that single repos can be a great poor-mans choice for framework authors, and monorepos - with the appropriate amount of investment in tooling - have good multiplicative potential for complex project clusters. YMMV.

    [0] https://github.com/uber-web/jazelle

    [1] https://github.com/sebbekarlsson/fjb/tree/master/external

    [2] https://mithril.js.org/

  • fjb

    fast javascript bundler :package:

  • Nice write-up. I have explored different repo strategies quite a bit myself in the course of a few efforts that I've been involved with. On one, we originally had a monolythic framework and everything the article said about cons is pretty spot on. However, I'll qualify by saying that I think the problems come less because of the nature of monolyths in general and more because of lack of experience with modular design.

    We then wrote a new framework using a monorepo approach, with separate packages using Lerna. The problem here was tooling. Dependent builds were not supported and I've had to delete node_modules more times than I'd ever cared to count. The article talks about some github specific problems (namely, the issues list being a hodge-podge of every disparate package). We tried zenhub, it works ok, but it's a hack and it kinda shows. I've seen other projects organize things via tags. Ultimately it comes down to what the team is willing to put up with.

    We eventually broke the monorepo out into multi-repos, and while that solved the problem of managing issues, now the problem was that publishing packages + cross-package dependencies meant that development was slower (especially with code reviews, blocking CI tests, etc).

    Back to a monorepo using Rush.js (and later Bazel). Rush had similar limitations as Lerna (in particular, no support for dependent tests) and we ditched it soon afterwards. Bazel has a lot of features, but it takes some investment to get the most out of it. I wrote a tool to wrap over it[0] and setup things to meet our requirements.

    We tried the "multi-monorepo" approach at one point (really, this is just git submodules), and didn't get very good results. The commands that you need to run are draconian and having to remember to sync things manually all the time is prone to errors. What's worse is that since you're dealing with physically separate repos, you're back to not having good ways to do atomic integration tests across package boundaries. To be fair, I've seen projects use the submodules approach[1] and it could work depending on how stable your APIs are, but for corporate requirements, where things are always in flux, it didn't work out well.

    Which brings me to another effort I was involved with more recently: moving all our multi-repo services into a monorepo. The main rationale here is somewhat related to another reason submodules don't really fly: there's a ton of packages being, a lot of stakeholders with various degrees of commit frequency, and reconciling security updates with version drift is a b*tch.

    For this effort we also invested into using Bazel. One of the strengths of this tool is how you can specify dependent tasks, for example "if I touch X file, only run the tests that are relevant". This is a big deal, because at 600+ packages, a full CI run consumes dozens of hours worth of compute time. The problem with monorepos comes largely from the sheer scale: bumping something to the next major version requires codemods, and there's always someone doing some crazy thing you never anticipated.

    With that said, monorepos are not a panacea. A project from a sibling team is a components library and it uses a single repo approach. This means a single version to manage for the entire set of components. You may object that things are getting bumped even when they don't need to, but it turns out this is actually very well received by consumers, because it's far easier to upgrade than having to figure out the changelog of dozens of separate packages.

    I used a single repo monolyth-but-actually-modular setup for my OSS project[2] and that has worked well for me, for similar reasons: people appreciate curation, and since we want to avoid willy-nilly breaking changes, a single all-emcompassing version scheme encourages development to work towards stability rather than features-for-features-sake.

    My takeaway is that multi-repos cause a lot of headaches both for framework authorship and for service development, that single repos can be a great poor-mans choice for framework authors, and monorepos - with the appropriate amount of investment in tooling - have good multiplicative potential for complex project clusters. YMMV.

    [0] https://github.com/uber-web/jazelle

    [1] https://github.com/sebbekarlsson/fjb/tree/master/external

    [2] https://mithril.js.org/

  • Symfony

    The Symfony PHP framework

  • While composer does have this limitation in that packages are published by making new tags within the repo, frameworks like symfony and cakephp have workarounds where they have one monorepo where all packages are worked on, and then automation to push changes to read only repos of each component. So there's https://github.com/symfony/symfony pushing to https://github.com/symfony/event-dispatcher which gets published to packagist.

  • event-dispatcher

    Provides tools that allow your application components to communicate with each other by dispatching events and listening to them (by symfony)

  • While composer does have this limitation in that packages are published by making new tags within the repo, frameworks like symfony and cakephp have workarounds where they have one monorepo where all packages are worked on, and then automation to push changes to read only repos of each component. So there's https://github.com/symfony/symfony pushing to https://github.com/symfony/event-dispatcher which gets published to packagist.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts