Service Breakdown in my Kubernetes Cluster: Steps, Solution, Learning

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • upptime

    šŸ“ˆ Uptime monitor and status page for Sebastian, powered by @upptime (by admantium-sg)

  • An additional feature of Upptime IO that I became aware of recently is its deep integration with Github. On the day that my services broke down, I did not only get a notification email, but also automatically a Github issue was created. These issues allow you to publicly communicate about an outage and to give users of your service a single point of information.

  • upptime

    ā¬†ļø GitHub Actions uptime monitor & status page by @AnandChowdhary

  • In this article, I discussed the recent Kubernetes service outage of my blog and my apps Lighthouse and ApiBlaze - right in the middle of my holiday! This article reflects how I approached the problem, what steps I took, and how I finally restored all services. The learning was plenty and surprising. First of all, Upptime JS is an excellent, free to use mentoring tool with Github integration. When Upptime noticed that my services were down, it automatically created Github issues. I like the idea that my Github Repo is a public dashboard, and the issues communicated the downtime. Second, the investigation into Kubernetes outage uncovered several problems that I solved one after the other: Failed deployments to master, disk ran out of space, and the certificate error. After all of this, I'm impressed again about the robustness of Kubernetes. I also made and learned from the crucial mistake of solving the tactical problems first before the root problem - always stabilize your servers first, then the services. After some cleanup, and the time-consuming application building and Docker registry uploads, all services were working again.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • lens

    Lens - The way the world runs Kubernetes

  • The next step is mundane, but necessary: Cleanup of all erroneous pods. At my work, we are using the excellent Lens UI app to manage Kubernetes. With this tool, I could get a much better overview. I cleaned up all broken pods.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Show HN: Built a self hosted clean status page and batteries

    4 projects | news.ycombinator.com | 22 Jan 2024
  • Github hosted statuspage! statsig-io/statuspage: A simple, zero-dependency, pure js/html status page based on GitHub Pages and Actions.

    3 projects | /r/selfhosted | 6 Dec 2021
  • Mirantis K8s Lens closed its source

    4 projects | news.ycombinator.com | 24 Mar 2024
  • The Inner Workings of Kubernetes Management Frontends ā€” A Software Engineerā€™s Perspective

    4 projects | dev.to | 14 Feb 2024
  • Introduction to Helm: Comparison to its less-scary cousin APT

    2 projects | dev.to | 9 Feb 2024