Getting started with Apache Beam for distributed data processing

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • beam

    Apache Beam is a unified programming model for Batch and Streaming data processing.

  • The bit-shift operator is overridden by defining __rrshift__ for PTransform to allow naming it. In the "Split" transform, each line is split into words. This collection of collections is flattened to a collection with beam.FlatMap. The "PairWithOne" transform maps every word to a tuple (x, 1). The first item is the key and the second item is the value. The key-value pairs are then fed to the "GroupAndSum" transform, where all values are summed up by key. This is parallelized word count!

  • learn-apache-beam

  • The code for the example can be found in the wordcount/ folder of this repository. To get started, move to the folder and install the requirements with

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Ask HN: Does (or why does) anyone use MapReduce anymore?

    2 projects | news.ycombinator.com | 24 Jan 2024
  • How do Streaming Aggregation Pipelines work?

    1 project | /r/dataengineering | 6 Dec 2023
  • Releasing Temporian, a Python library for processing temporal data, built together with Google

    2 projects | /r/Python | 17 Sep 2023
  • Kafka cluster loses or duplicates messages

    1 project | /r/codehunter | 27 Apr 2023
  • Apache Beam

    1 project | news.ycombinator.com | 24 Apr 2023