Ask HN: How do you monitor your systemd services?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Healthchecks

208 7,291 9.7 Python

Open-source cron job and background task monitoring service, written in Python & Django

If you are ok with a Saas and if it's just scheduled jobs that you are monitoring, there are a number of monitoring tools where you tell when job completes (with a http request) and a missing ping (after a grace period) means that it failed.
I think https://deadmanssnitch.com/ may have been the original service for this.
https://healthchecks.io/ has a fairly generous free tier that I use now.
There are others that do the same thing Sentry, Uptime Robot, ...

ntfy

288 16,590 9.6 Go

Send push notifications to your phone or desktop using PUT/POST

Uptime-Kuma [1] with ntfy [2]. Most of my services expose HTTP so I just have Uptime-Kuma monitor that. But if you have something that is not exposed to the public you can still use a "push" type monitor, and in a cron job on your server(s), send heartbeat to it when everything is working.
[1] https://github.com/louislam/uptime-kuma
[2] https://ntfy.sh/

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
rkvdns_examples

2 0 7.7 Python

Examples for RKVDNS under a more permissive license.

In general this evolves to a SIEM-like solution in IT or gets added to the tag menagerie in OT.
If you're focused on "notifications are bad" note that notifications are push, and pull solutions are possible. Tail logs (or journalctl) and post significant events to Redis (https://github.com/m3047/rkvdns_examples/tree/main/totalizer...) for example.

collectd-systemd

1 9 10.0 Python

collectd plugin to monitor systemd services

This combo does the job for me: grafana + riemann + influxdb and collectd as the main agent. collectd bundles many plugins so you can watch logs, monitor running processes or have something custom [1]. This setup is very light to start with and can scale well (up until you hit influxdb limits :D).
[1] https://github.com/mbachry/collectd-systemd

systemd-utils

2 85 3.2 Python

Random systemd utilities

I use the `OnFailure` property to trigger a service that emails me for failed services like backups which are run as system timers + service.
I also use `failure-monitor` which is Python service that monitors `journald`.
Files on Github for those interested:
https://github.com/kylemanna/systemd-utils

uptime-kuma

351 49,253 9.8 JavaScript

A fancy self-hosted monitoring tool

Uptime-Kuma [1] with ntfy [2]. Most of my services expose HTTP so I just have Uptime-Kuma monitor that. But if you have something that is not exposed to the public you can still use a "push" type monitor, and in a cron job on your server(s), send heartbeat to it when everything is working.
[1] https://github.com/louislam/uptime-kuma
[2] https://ntfy.sh/

Netdata

118 68,153 10.0 C

The open-source observability platform everyone needs

> So I turned to Netdata. A one liner on each server and we had super sexy and fast dashboard for each server. No birds eye view, but fine. I then spent maybe 3-4 days trying to figure out how to get alerting to work (just email, but fine) and get temperature readings (or something like that).
I work in Netdata. Just wanted to mention that as of last release a parent node will show all children in the agent dashboard so if doing again as of today a parent netdata might have got you the birds eye view as a starting point https://github.com/netdata/netdata/releases/tag/v1.41.0

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project