Build a data ingestion pipeline using Kafka, Flink, and CrateDB

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews

    # Acquire Flink job VERSION=0.2 JARFILE="cratedb-flink-jobs-${VERSION}.jar" wget https://github.com/crate/cratedb-flink-jobs/releases/download/${VERSION}/${JARFILE} # Invoke Flink job docker run -it --network=scada-demo --volume=$(pwd)/${JARFILE}:/${JARFILE} flink:1.12 \ flink run --jobmanager=flink-jobmanager:8081 /${JARFILE} \ --kafka.servers kafka-broker:9092 \ --kafka.topic rides \ --crate.hosts cratedb:5432 \ --crate.table taxi_rides

  • crate-jdbc

    A JDBC driver for CrateDB.

  • This guide references the example job published at https://github.com/crate/cratedb-flink-jobs. This example job brings together three software components: the Kafka connector for Flink, the JDBC connector for Flink, and the CrateDB JDBC driver. It uses a sample dataset including a subset of trip records completed in NYC taxis during 2017. Explore the repository for more insights into it.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • kafkacat

    Discontinued Generic command line non-JVM Apache Kafka producer and consumer [Moved to: https://github.com/edenhill/kcat]

  • To communicate with Kafka, you can use Kafkacat, a command-line tool that allows to produce and consume Kafka messages using a very simple syntax. It also allows you to view the topics' metadata.

  • git

    A fork of Git containing Windows-specific patches. (by git-for-windows)

  • This guide assumes you have Docker, Git, Homebrew, and Wget installed. If you don't have/don't want to install these components in your machine, you can always use alternatives, but the steps on this guide will follow more smoothly if you have them installed.

  • Docker Compose

    Define and run multi-container applications with Docker

  • The simplest possible way to setup and start all software components at once is to use Docker with Docker Compose. To do so, first set up a sandbox directory and navigate to it with your terminal:

  • HomeBrew

    🍺 The missing package manager for macOS (or Linux)

  • This guide assumes you have Docker, Git, Homebrew, and Wget installed. If you don't have/don't want to install these components in your machine, you can always use alternatives, but the steps on this guide will follow more smoothly if you have them installed.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts