Analysing Github Stars - Extracting and analyzing data from Github using Apache NiFi®, Apache Kafka® and Apache Druid®

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • Druid

    Apache Druid: a high performance real-time analytics database.

    Spencer Kimball (now CEO at CockroachDB) wrote an interesting article on this topic in 2021 where they created spencerkimball/stargazers based on a Python script. So I started thinking: could I create a data pipeline using Nifi and Kafka (two OSS tools often used with Druid) to get the API data into Druid - and then use SQL to do the analytics? The answer was yes! And I have documented the outcome below. Here’s my analytical pipeline for Github stars data using Nifi, Kafka and Druid.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Apache ZooKeeper

    Apache ZooKeeper

    You can install Kafka from https://kafka.apache.org/quickstart. Because Druid and Kafka both use Apache Zookeeper, I opted to use the Zookeeper deployment that comes with Druid, so didn’t start it with Kafka. Once running, I created two topics for me to post the data into, and for Druid to ingest from:

  • stargazers

    Analyze GitHub stars

    Spencer Kimball (now CEO at CockroachDB) wrote an interesting article on this topic in 2021 where they created spencerkimball/stargazers based on a Python script. So I started thinking: could I create a data pipeline using Nifi and Kafka (two OSS tools often used with Druid) to get the API data into Druid - and then use SQL to do the analytics? The answer was yes! And I have documented the outcome below. Here’s my analytical pipeline for Github stars data using Nifi, Kafka and Druid.

  • druid-datasets

    Sample data and scripts that can be used to play with Apache Druid

    Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Nifi is very useful when data needs to be loaded from different sources. In this case, I will nifi to access the Github API as it is very easy to make repeated calls to a Http endpoint and get data from multiple pages. You can see what I did by downloading NiFi yourself and then adding my template from the Druid Datasets repo: https://github.com/implydata/druid-datasets/blob/main/githubstars/github_stars.xml

  • cockroach

    CockroachDB - the open source, cloud-native distributed SQL database.

    Spencer Kimball (now CEO at CockroachDB) wrote an interesting article on this topic in 2021 where they created spencerkimball/stargazers based on a Python script. So I started thinking: could I create a data pipeline using Nifi and Kafka (two OSS tools often used with Druid) to get the API data into Druid - and then use SQL to do the analytics? The answer was yes! And I have documented the outcome below. Here’s my analytical pipeline for Github stars data using Nifi, Kafka and Druid.

  • nifi

    Apache NiFi

    Spencer Kimball (now CEO at CockroachDB) wrote an interesting article on this topic in 2021 where they created spencerkimball/stargazers based on a Python script. So I started thinking: could I create a data pipeline using Nifi and Kafka (two OSS tools often used with Druid) to get the API data into Druid - and then use SQL to do the analytics? The answer was yes! And I have documented the outcome below. Here’s my analytical pipeline for Github stars data using Nifi, Kafka and Druid.

  • ApacheKafka

    A curated re-sources list for awesome Apache Kafka

    Spencer Kimball (now CEO at CockroachDB) wrote an interesting article on this topic in 2021 where they created spencerkimball/stargazers based on a Python script. So I started thinking: could I create a data pipeline using Nifi and Kafka (two OSS tools often used with Druid) to get the API data into Druid - and then use SQL to do the analytics? The answer was yes! And I have documented the outcome below. Here’s my analytical pipeline for Github stars data using Nifi, Kafka and Druid.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • To study Apache Kafka Architecture in details, and how to install, deploy configure Apache kafka.

    1 project | dev.to | 17 Nov 2022
  • System Design: Service Discovery

    1 project | dev.to | 17 Sep 2022
  • Setting up a local Apache Kafka instance for testing

    2 projects | dev.to | 31 Aug 2022
  • Setting up SolrCloud for Production

    1 project | dev.to | 28 Jul 2022
  • Ask HN: Peer-to-peer consensus protocols / algorithms

    1 project | news.ycombinator.com | 7 Jul 2022

Did you konow that Java is
the 8th most popular programming language
based on number of metions?