ClickHouse for usage metering with Kafka Connect

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • openmeter

    Cloud Metering for AI, Billing and FinOps. Collect and aggregate millions of usage events in real-time.

  • This plugin guarantees exactly-once delivery between Kafka topics and ClickHouse tables, which is critical, as Kafka Connect tasks are only aware of the latest topic offset acknowledged by the consumer. For example, consumers can fail to acknowledge a processed offset due to a network error or an exception. This is great as exactly-once inserts prevent dropping or double-inserting usage, leading to incorrect billing.

    In OpenMeter, we pre-aggregate usage events into one-minute tumbling windows to reduce the number of rows we need to scan at query time. To do this, with ClickHouse, we use the AggregatingMergeTree table engine that enables incremental data aggregation when combined with MaterializedView. In ClickHouse, MaterializedViews are trigger-based and update when new records are inserted into the source table. Consequently, the corresponding materialized views are updated whenever Kafka Connect transfers a batch of events to ClickHouse. This also means inserts can fail when the view cannot process a record at trigger. We send failed events into the Dead Letter Queue topic for later processing.

    To help ClickHouse with hot topics, we will consider adding an extra streaming aggregation step for high-producers, but this time with a more horizontally scalable stream processor like Arroyo. This would reduce ClickHouse insert batch sizes. Based on our tests, ClickHouse works best if batch sizes are 50-100k and less frequent than per second.

    To see it in action, check out our open-source repo: https://github.com/openmeterio/openmeter

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • OpenMeter – open-source Realtime Metering

    1 project | news.ycombinator.com | 9 Feb 2024
  • Looking for feedback on our website for an Open Source project

    1 project | /r/websitefeedback | 7 Oct 2023
  • Real-Time and Scalable Usage Metering

    1 project | /r/opensource | 11 Sep 2023
  • GitHub - openmeterio/openmeter: Accurate and real-time usage metering for AI, DevOps, billing and analytics.

    1 project | /r/foss | 24 Aug 2023
  • How to meter POD execution duration for billing?

    1 project | /r/kubernetes | 12 Jul 2023