Monitoring Raspberry Pi Devices Using Telegraf, InfluxDB and Grafana

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • jrock.us

    jrock.us production infrastructure and personal website

  • I used 1.x for my push-monitoring stack at my last job. (For cases where "pull" is practical, I would always use Prometheus. Prometheus also has "push" now, by the way.) They went into 2.0 mode and kind of neglected 1.x, and I kind of forgot about it. At the time, I was most familiar with an internal monitoring system at Google, and I found I couldn't do queries that I expected to be able to do. I even mentioned it on HN and some influx folks told me that what I wanted to do was too weird to support. (It's not. I was collecting byte counters from fiber CPEs, and wanted to have bandwidth charts based on topology tags I stored with the data -- imagine a SQL table like (serial_number text not null, time timestamp not null, locality text not null, bytes_sent int64 not null, bytes_received int64 not null). The problem was that timestamps would not be aligned between records in the same locality group -- I sampled these occasionally throughout the day and not all at the same instant. And, they were counters, not deltas, so the query would have to do the delta across each serial number, and then aggregate across all devices in a locality. Very possible to do, I literally had that chart with the other monitoring system. But not possible with the influx v1 querying, as far as I could tell.)

    I set up 2.x for myself recently, and they have really done a lot of work. The OSS offering has most of the features that cloud/enterprise would. It was easy to set up -- they don't have any instructions for installing it in Kubernetes, and haven't updated their Helm charts for 2.x, but it was like 3 minutes to write a manifest (https://github.com/jrockway/jrock.us/tree/master/production/...) myself, which I prefer 99.9% of the time anyway. The new query language is incredibly verbose, but I see the steps that I remember having with Google's internal system, align, delta, aggregate... all possible. (I had to scratch my head a lot, though, to make it work. And I really am not able to reason about what operations it's doing, what's indexed or not indexed, why I ingest my data as rows but process it as columns, etc.) The performance is good, and it worked well for my use case of pushing data from my Intranet of Stuff. Generally I like it and I don't think they are being shady in any way. Definitely considering using it at work for collecting timing information from regular performance tests and CI steps. (To enable my coworkers to make performance improvements and see "according to this dashboard, I made this release 10% faster!")

    The reason I picked InfluxDB over TimescaleDB for my personal stuff is because InfluxDB has an API with built-in authentication. I can give each of my devices an API key from their web interface, and I make an HTTP request to write data. Very simple. (They have a client library, but honestly my main target is a Beaglebone, and it doesn't have enough memory to compile their client library. I've never seen "go build" run out of memory, but their client makes that happen. I shouldn't develop on my IoT device, of course, but it's just easier because it has Emacs and gopls, and all the sensors connected to the right bus. Was easier to just manually make the API calls than to cross-compile on my workstation and push the release build to the actual device.) TimescaleDB doesn't have that, because it's just Postgres. So I'd basically have to expose port 5432 to the world, create Postgres users for every device, generate a password, store that somewhere, etc. Then to ingest data, I'd connect to the database, tune my connection pool, retry failed requests manually, etc. Using HTTP gets me all that for free; I can just configure retries in Envoy.

    But... SQL queries are a lot easier to figure out than FluxQL queries, and I already have good tools for manipulating raw data in Postgres (DataGrip is my preferred method), so I think I will likely be revisiting TimescaleDB. Honestly, I'd pay for a managed offering right now if they had a button in Google Cloud Console that was "Create Instance and by the way this just gets added to your GCP bill for 10% more than a normal Cloud SQL instance".

  • Telegraf

    The plugin-driven server agent for collecting & reporting metrics.

  • Note that you can gather and display logs with this same stack, Telegraf includes a plugin to consume syslog output: https://github.com/influxdata/telegraf/blob/release-1.14/plu... and then you can do something like this on Grafana: https://grafana.com/api/dashboards/12433/images/9004/image

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • VictoriaMetrics

    VictoriaMetrics: fast, cost-effective monitoring solution and time series database

  • If you like TICK stack but not impressed with what InfluxDB has in Open Source Version check out VictoriaMetrics

    https://victoriametrics.com/

    It supports InfluxDB Protocol and Telegraf Directly

    https://docs.victoriametrics.com/#how-to-send-data-from-infl...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts