-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
TL;DR: Handling Kafka event consumption when processing events needs a long variable-length time is particularly challenging. This post is a discussion on this challenge and possible solutions for it. A simple implementation of a chosen solution, which is accessible via this link, is also described.
Let's say you have a service that takes customers' requests, distributes them as events amongst several workers, waits for the workers to perform a long-running process for each request, and gathers the results to send back to customers. If you intend to use Apache Kafka as the medium to populate events and you are not careful, chances are you are going to encounter a certain predicament: Your requests will be processed more than once by different workers!
This implementation is not meant to be bug-free and production-ready and is just a glimpse into what the proposed solution might look like. Additionally, we assume the producer application is already in place and produces events. The consumer application receives events as they are produced, process them, commits them on Kafka, and sends a new event containing the results to a separate topic to inform the producer of the result of the process. We also use Protocol Buffers to serialize our communication.