Analysing Github Stars - Extracting and analyzing data from Github using Apache NiFi®, Apache Kafka® and Apache Druid®

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • Druid

    Apache Druid: a high performance real-time analytics database.

    Spencer Kimball (now CEO at CockroachDB) wrote an interesting article on this topic in 2021 where they created spencerkimball/stargazers based on a Python script. So I started thinking: could I create a data pipeline using Nifi and Kafka (two OSS tools often used with Druid) to get the API data into Druid - and then use SQL to do the analytics? The answer was yes! And I have documented the outcome below. Here’s my analytical pipeline for Github stars data using Nifi, Kafka and Druid.

  • Apache ZooKeeper

    Apache ZooKeeper

    You can install Kafka from https://kafka.apache.org/quickstart. Because Druid and Kafka both use Apache Zookeeper, I opted to use the Zookeeper deployment that comes with Druid, so didn’t start it with Kafka. Once running, I created two topics for me to post the data into, and for Druid to ingest from:

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • stargazers

    Analyze GitHub stars

    Spencer Kimball (now CEO at CockroachDB) wrote an interesting article on this topic in 2021 where they created spencerkimball/stargazers based on a Python script. So I started thinking: could I create a data pipeline using Nifi and Kafka (two OSS tools often used with Druid) to get the API data into Druid - and then use SQL to do the analytics? The answer was yes! And I have documented the outcome below. Here’s my analytical pipeline for Github stars data using Nifi, Kafka and Druid.

  • druid-datasets

    Sample data and scripts that can be used to play with Apache Druid

    Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Nifi is very useful when data needs to be loaded from different sources. In this case, I will nifi to access the Github API as it is very easy to make repeated calls to a Http endpoint and get data from multiple pages. You can see what I did by downloading NiFi yourself and then adding my template from the Druid Datasets repo: https://github.com/implydata/druid-datasets/blob/main/githubstars/github_stars.xml

  • cockroach

    CockroachDB - the open source, cloud-native distributed SQL database.

    Spencer Kimball (now CEO at CockroachDB) wrote an interesting article on this topic in 2021 where they created spencerkimball/stargazers based on a Python script. So I started thinking: could I create a data pipeline using Nifi and Kafka (two OSS tools often used with Druid) to get the API data into Druid - and then use SQL to do the analytics? The answer was yes! And I have documented the outcome below. Here’s my analytical pipeline for Github stars data using Nifi, Kafka and Druid.

  • nifi

    Apache NiFi

    Spencer Kimball (now CEO at CockroachDB) wrote an interesting article on this topic in 2021 where they created spencerkimball/stargazers based on a Python script. So I started thinking: could I create a data pipeline using Nifi and Kafka (two OSS tools often used with Druid) to get the API data into Druid - and then use SQL to do the analytics? The answer was yes! And I have documented the outcome below. Here’s my analytical pipeline for Github stars data using Nifi, Kafka and Druid.

  • ApacheKafka

    A curated re-sources list for awesome Apache Kafka

    Spencer Kimball (now CEO at CockroachDB) wrote an interesting article on this topic in 2021 where they created spencerkimball/stargazers based on a Python script. So I started thinking: could I create a data pipeline using Nifi and Kafka (two OSS tools often used with Druid) to get the API data into Druid - and then use SQL to do the analytics? The answer was yes! And I have documented the outcome below. Here’s my analytical pipeline for Github stars data using Nifi, Kafka and Druid.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts