Let's make faster Gitlab CI/CD pipelines – From 14 to 3 mins

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

gitlab

447 - -

Thank you for the great thoughts :)
> And maybe only cache the downloads on the main branch.
$CI_COMMIT_REF_SLUG resolves into the branch when executed in a pipeline. Using it as value for the cache key, Git branches (and related MRs) use different caches. It can be one way to avoid collision but requires more storage with multiple caches. https://docs.gitlab.com/ee/ci/variables/predefined_variables...
In general, I agree, the more caches and parallel execution you add, the more complex and error prone it can get. Simulating a pipeline with runtime requirements like network & caches needs its own "staging" env for developing pipelines. That's a scenario not many have, or might be willing to assign resources onto. Static simulation where you predict the building blocks from the yaml config, is something GitLab's pipeline authoring team is working on in https://gitlab.com/groups/gitlab-org/-/epics/6498
And it is also a matter of insights and observability - the critical path in the pipeline has a long max duration, where do you start analysing and how do you prevent this scenario from happening again. Monitoring with the GitLb CI Pipeline Exporter for Prometheus is great, another way of looking into CI/CD pipelines can be tracing.
CI/CD Tracing with OpenTelemetry is discussed in https://gitlab.com/gitlab-org/gitlab/-/issues/338943 to learn about user experiences, and define the next steps. Imho a very hot topic, seeing more awareness for metrics and traces from everyone. Like, seeing the full trace for pipeline from start to end with different spans inside, and learning that the container image pull takes a long time. That can be the entry point into deeper analysis.
Another idea is to make app instrumentation easier for developers, providing tips for e.g. adding /metrics as an http endpoint using Prometheus and OpenTelemetry client libraries. That way you not only see the CI/CD infrastructure & pipelines, but also user side application performance monitoring and beyond in distributed environments. I'm collecting ideas for blog posts in https://gitlab.com/gitlab-com/marketing/corporate_marketing/...
For someone starting with pipeline efficiency tasks, I'd recommend setting a goal - like shown in the blog post X minutes down to Y - and then start with analysing to get an idea about the blocking parts. Evaluate and test solutions for each part, e.g. a terraform apply might depend on AWS APIs, whereas a Docker pull could be switched to use the Dependency proxy in GitLab for caching.
Each environment has different requirements - collect helpful resources from howtos, blog posts, docs, HN threads, etc. and also ask the community about their experience. https://forum.gitlab.com/ is a good spot too. Recommend to create an example project highlighting the pipeline, and allowing everyone to fork, analyse, add suggestions.

workshops

3 - -

Great post, thanks for sharing. We should link that in the Pipeline Efficiency docs: https://docs.gitlab.com/ee/ci/pipelines/pipeline_efficiency....
I've given a talk about similar ideas for efficient pipelines at Continuous Lifecycle, the slides have many URLs inside to learn async: https://docs.google.com/presentation/d/1nq7Q4WMv6rQc6WFJCRqj...
And if you want dive deeper, a free full day workshop with exercises to practice config, resource, caches, container images and more. I've created it for the Open Source Automation Days in early October.
Slides with exercises: https://docs.google.com/presentation/d/12ifd_w7G492FHRaS9CXA...
Exercises+solutions: https://gitlab.com/gitlab-de/workshops/ci-cd-pipeline-effici...
I did not have time yet to write a blog post sharing more insights on the exercises, but they should be self-explaining on the slides, with solutions in the repository. Let me know how it goes, feel free to repurpose for your own blog posts, and send documentation updates please :)

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Harbor

73 22,318 9.7 Go

An open source trusted cloud native registry project that stores, signs, and scans content.

One point worth of note when it comes to Docker caching, more specifically pulling images, is the rate-limiting on Docker Hub.
While hosted GitLab might make use of a transparent pull-through cache (as I've gathered from glancing at relevant parts of the docs), you can benefit a lot by using one with your own local GitLab instance (assuming it does not already provide it via container registry).
We ended up switching to Harbor[1] from the vanilla registry and almost by chance stumbled on the fact that it supported a pull-through cache from various other sources (including Docker Hub).
This was especially useful after we hit the rate-limit after one of our pipelines got out of hand and decided to rebuild every locally hosted Docker image (both internal and external).
[1]: https://goharbor.io/

gitlab-runner

47 - -

Great article, wish I had something like that 3 years ago.
Adding my personal tips:
- Do not use GitLab specific caching features, unless you love vendor lock in. Instead, use multi stage Docker builds. This way you can also run your pipeline locally and all your GitLab jobs will consist of "docker build ..."
- Upvote https://gitlab.com/gitlab-org/gitlab-runner/-/issues/2797 . Testing GitLab pipelines should not be such a PIA.

technical-writing

1 - -

> We should link that in the Pipeline Efficiency docs
I've shared the resources and this HN topic with GitLab's technical writing team to make tutorials more visible on docs.gitlab.com
https://gitlab.com/gitlab-org/technical-writing/-/issues/511...

marketing

9 - -

Thank you for the great thoughts :)
> And maybe only cache the downloads on the main branch.
$CI_COMMIT_REF_SLUG resolves into the branch when executed in a pipeline. Using it as value for the cache key, Git branches (and related MRs) use different caches. It can be one way to avoid collision but requires more storage with multiple caches. https://docs.gitlab.com/ee/ci/variables/predefined_variables...
In general, I agree, the more caches and parallel execution you add, the more complex and error prone it can get. Simulating a pipeline with runtime requirements like network & caches needs its own "staging" env for developing pipelines. That's a scenario not many have, or might be willing to assign resources onto. Static simulation where you predict the building blocks from the yaml config, is something GitLab's pipeline authoring team is working on in https://gitlab.com/groups/gitlab-org/-/epics/6498
And it is also a matter of insights and observability - the critical path in the pipeline has a long max duration, where do you start analysing and how do you prevent this scenario from happening again. Monitoring with the GitLb CI Pipeline Exporter for Prometheus is great, another way of looking into CI/CD pipelines can be tracing.
CI/CD Tracing with OpenTelemetry is discussed in https://gitlab.com/gitlab-org/gitlab/-/issues/338943 to learn about user experiences, and define the next steps. Imho a very hot topic, seeing more awareness for metrics and traces from everyone. Like, seeing the full trace for pipeline from start to end with different spans inside, and learning that the container image pull takes a long time. That can be the entry point into deeper analysis.
Another idea is to make app instrumentation easier for developers, providing tips for e.g. adding /metrics as an http endpoint using Prometheus and OpenTelemetry client libraries. That way you not only see the CI/CD infrastructure & pipelines, but also user side application performance monitoring and beyond in distributed environments. I'm collecting ideas for blog posts in https://gitlab.com/gitlab-com/marketing/corporate_marketing/...
For someone starting with pipeline efficiency tasks, I'd recommend setting a goal - like shown in the blog post X minutes down to Y - and then start with analysing to get an idea about the blocking parts. Evaluate and test solutions for each part, e.g. a terraform apply might depend on AWS APIs, whereas a Docker pull could be switched to use the Dependency proxy in GitLab for caching.
Each environment has different requirements - collect helpful resources from howtos, blog posts, docs, HN threads, etc. and also ask the community about their experience. https://forum.gitlab.com/ is a good spot too. Recommend to create an example project highlighting the pipeline, and allowing everyone to fork, analyse, add suggestions.

playground

1 - -

The dependency proxy in GitLab can help with caching the Docker images.
https://docs.gitlab.com/ee/user/packages/dependency_proxy/
When Docker Hub introduced rate limits last year, one way to mitigate its impact was to make the dependency proxy available for everyone.
https://about.gitlab.com/blog/2020/10/30/mitigating-the-impa...
https://about.gitlab.com/blog/2020/10/30/minor-breaking-chan...
Another way can be maintaining your own group which maintains all base images for your environment, and stores them in the GitLab package registry.
Using your own images can help enforce security policies, to avoid that containers introduce vulnerabilities (e.g. when always pulling :latest tag, or sticking to old tags). Reminds me of Infrastructure as Code security scanning, which can help detect things like this. Played with it when released in GitLab 14.5, examples in https://gitlab.com/gitlab-de/playground/infrastructure-as-co... + https://gitlab.com/gitlab-de/playground/infrastructure-as-co...
Depending on the languages and environments involved, builder images might be needed to reduce complexity with installing build tools. Like, C++ with gcc and controlling the exact version being used (major versions may introduce ABI breaking changes, I've seen it with armhf and gcc6 a while ago). Builder images also reduce the possibly for users to make mistakes in CI/CD before_script/script sections. An optimized pipeline may just include a remote CI/CD template with the magic happening in central maintained projects, using the builder images.
Another thought on builder images - multi arch images with buildx. I've read https://medium.com/@tomwillfixit/migrating-a-dockerized-gitl... yesterday, need to learn more :)

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project