-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
An additional feature of Upptime IO that I became aware of recently is its deep integration with Github. On the day that my services broke down, I did not only get a notification email, but also automatically a Github issue was created. These issues allow you to publicly communicate about an outage and to give users of your service a single point of information.
In this article, I discussed the recent Kubernetes service outage of my blog and my apps Lighthouse and ApiBlaze - right in the middle of my holiday! This article reflects how I approached the problem, what steps I took, and how I finally restored all services. The learning was plenty and surprising. First of all, Upptime JS is an excellent, free to use mentoring tool with Github integration. When Upptime noticed that my services were down, it automatically created Github issues. I like the idea that my Github Repo is a public dashboard, and the issues communicated the downtime. Second, the investigation into Kubernetes outage uncovered several problems that I solved one after the other: Failed deployments to master, disk ran out of space, and the certificate error. After all of this, I'm impressed again about the robustness of Kubernetes. I also made and learned from the crucial mistake of solving the tactical problems first before the root problem - always stabilize your servers first, then the services. After some cleanup, and the time-consuming application building and Docker registry uploads, all services were working again.
The next step is mundane, but necessary: Cleanup of all erroneous pods. At my work, we are using the excellent Lens UI app to manage Kubernetes. With this tool, I could get a much better overview. I cleaned up all broken pods.
Related posts
-
Show HN: Built a self hosted clean status page and batteries
-
Github hosted statuspage! statsig-io/statuspage: A simple, zero-dependency, pure js/html status page based on GitHub Pages and Actions.
-
Mirantis K8s Lens closed its source
-
The Inner Workings of Kubernetes Management Frontends ā A Software Engineerās Perspective
-
Introduction to Helm: Comparison to its less-scary cousin APT