druid-datasets
nifi
Our great sponsors
- Onboard AI - Learn any GitHub repo in 59 seconds
- Sonar - Write Clean Java Code. Always.
- InfluxDB - Collect and Analyze Billions of Data Points in Real Time
- Revelo Payroll - Free Global Payroll designed for tech teams
druid-datasets | nifi | |
---|---|---|
1 | 32 | |
0 | 4,013 | |
- | 1.4% | |
10.0 | 0.0 | |
7 months ago | 2 days ago | |
Java | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
druid-datasets
-
Analysing Github Stars - Extracting and analyzing data from Github using Apache NiFi®, Apache Kafka® and Apache Druid®
Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Nifi is very useful when data needs to be loaded from different sources. In this case, I will nifi to access the Github API as it is very easy to make repeated calls to a Http endpoint and get data from multiple pages. You can see what I did by downloading NiFi yourself and then adding my template from the Druid Datasets repo: https://github.com/implydata/druid-datasets/blob/main/githubstars/github_stars.xml
nifi
- Tool decision - What architecture would you choose and why?
-
Is there something like airflow but written in Scala/Java?
Apache Camel Apache Nifi Spring Cloud
-
Your opinion on Kong
This suggestion isn't a standard one, but when a coworker and I were looking for API gateways with a very specific feature set, we couldn't find a single one to do what we needed. We did, however, come across Apache NiFi. It's a flow-based programming tool that allowed us to translate an http-based request to streaming text sent via netcat.
-
S3 to S3 transform
For a simple sequential Pipeline, my goto would be Apache Camel. As soon as you want complexity its either Apache Nifi or a micro service architecture.
-
Analysing Github Stars - Extracting and analyzing data from Github using Apache NiFi®, Apache Kafka® and Apache Druid®
Spencer Kimball (now CEO at CockroachDB) wrote an interesting article on this topic in 2021 where they created spencerkimball/stargazers based on a Python script. So I started thinking: could I create a data pipeline using Nifi and Kafka (two OSS tools often used with Druid) to get the API data into Druid - and then use SQL to do the analytics? The answer was yes! And I have documented the outcome below. Here’s my analytical pipeline for Github stars data using Nifi, Kafka and Druid.
-
Is there any automation solution that isn't "only" CI/CD except Jenkins?
For dataflow pipelines I'm really a fan of apache nifi https://nifi.apache.org/
-
Read database each 5 sec and dispatch event
Pretty sure you can do this with a NiFi connector. https://nifi.apache.org
-
How-to-Guide: Contributing to Open Source
Apache NiFi
- Windmill.dev
-
What is your favourite task queuing framework?
Apache NiFi -> More for data analysis/transformation
What are some alternatives?
Logstash - Logstash - transport and process your logs, events, or other data
superset - Apache Superset is a Data Visualization and Data Exploration Platform
meltano
meltano - Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Metabase - The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
Apache Cassandra - Mirror of Apache Cassandra
nifi-extracttext-processor - Apache NiFi Custom Processor Extracting Text From Files with Apache Tika
django-project-template - The Django project template I use, for installation with django-admin.
grouparoo - 🦘 The Grouparoo Monorepo - open source customer data sync framework
Strapi - 🚀 Strapi is the leading open-source headless CMS. It’s 100% JavaScript/TypeScript, fully customizable and developer-first.
react-admin - A frontend Framework for building B2B applications running in the browser on top of REST/GraphQL APIs, using ES6, React and Material Design