druid-datasets VS ApacheKafka

Compare druid-datasets vs ApacheKafka and see what are their differences.


Sample data and scripts that can be used to play with Apache Druid (by implydata)


A curated re-sources list for awesome Apache Kafka (by jitendra3109)
Our great sponsors
  • Mergify - Updating dependencies is time-consuming.
  • Sonar - Write Clean Java Code. Always.
  • InfluxDB - Collect and Analyze Billions of Data Points in Real Time
druid-datasets ApacheKafka
1 92
0 24
- -
10.0 10.0
7 months ago almost 6 years ago
Apache License 2.0 -
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.


Posts with mentions or reviews of druid-datasets. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-01-11.
  • Analysing Github Stars - Extracting and analyzing data from Github using Apache NiFi®, Apache Kafka® and Apache Druid®
    8 projects | dev.to | 11 Jan 2023
    Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Nifi is very useful when data needs to be loaded from different sources. In this case, I will nifi to access the Github API as it is very easy to make repeated calls to a Http endpoint and get data from multiple pages. You can see what I did by downloading NiFi yourself and then adding my template from the Druid Datasets repo: https://github.com/implydata/druid-datasets/blob/main/githubstars/github_stars.xml


Posts with mentions or reviews of ApacheKafka. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-21.
  • The Complete Microservices Guide
    17 projects | dev.to | 21 Sep 2023
    Inter-Service Communication: Middleware provides communication channels and protocols that enable microservices to communicate with each other. This can include message brokers like RabbitMQ, Apache Kafka, RPC frameworks like gRPC, or RESTful APIs.
  • Database Review: Top Five Missing Features from Database APIs
    19 projects | dev.to | 14 Sep 2023
    Kafka (most complex)
  • Handling computer vision events in real-time with Python, Kafka and Pipeless
    3 projects | dev.to | 6 Sep 2023
    This article explains how you can generate and process computer vision events in real-time using Pipeless and Kafka. Pipeless is an open-source computer vision framework to build and deploy apps in minutes. Kafka is a popular OSS distributed event streaming platform.
  • Track every PostgreSQL data change using Debezium
    3 projects | dev.to | 27 Aug 2023
    Whenever a new row is added, a row is updated, or a row is deleted, Debezium notices it immediately. It then packages up these changes and sends them as a continuous stream of events by leveraging the power of Apache Kafka.
  • Integrate Pyspark Structured Streaming with confluent-kafka
    2 projects | dev.to | 12 Aug 2023
    Apache Kafka - https://kafka.apache.org/
  • The Role of Queues in Building Efficient Distributed Applications
    4 projects | dev.to | 10 Aug 2023
    As shown above, we are using Apache Kafka as the messaging queue. Each URL will be pushed into the queue as a single event from the URLInputLambda. Next, the ScraperLambdaget each event from the queue to be processed.
  • Ingesting Data into OpenSearch using Apache Kafka and Go
    6 projects | dev.to | 13 Jul 2023
    Scalable data ingestion is a key aspect for a large-scale distributed search and analytics engine like OpenSearch. One of the ways to build a real-time data ingestion pipeline is to use Apache Kafka. It's an open-source event streaming platform used to handle high data volume (and velocity) and integrates with a variety of sources including relational and NoSQL databases. For example, one of the canonical use cases is real-time synchronization of data between heterogeneous systems (source components) to ensure that OpenSearch indexes are fresh and can be used for analytics or consumed downstream applications via dashboards and visualizations.
  • Analyzing Real-Time Movie Reviews With Redpanda and Memgraph
    2 projects | dev.to | 6 Jul 2023
    In recent years, it has become apparent that almost no production system is complete without real-time data. This can also be observed through the rise of streaming platforms such as Apache Kafka, Apache Pulsar, Redpanda, and RabbitMQ.
  • Testando Kafka no Spring Boot com Testcontainers
    2 projects | dev.to | 26 Jun 2023
  • Visualize Real-Time Data With Python, Dash, and RisingWave
    5 projects | dev.to | 16 Jun 2023
    We know that real-time data is data that is generated and processed immediately, as it is collected from different data sources. Sources can be typical databases such as Postgres or MySQL, and message brokers like Kafka. A real-time data visualization consists of a few steps, first we ingest, then process, and finally show this data in a dashboard.

What are some alternatives?

When comparing druid-datasets and ApacheKafka you can also consider the following projects:

dramatiq - A fast and reliable background task processing library for Python 3.

outbox-inbox-patterns - Repository to support the article "Building a Knowledge Base Service With Neo4j, Kafka, and the Outbox Pattern"

RabbitMQ - Open source RabbitMQ: core server and tier 1 (built-in) plugins

Jenkins - Jenkins automation server

debezium - Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

istio - Connect, secure, control, and observe services.

Grafana - The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.

Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing

demo-scene - 👾Scripts and samples to support Confluent Demos and Talks. ⚠️Might be rough around the edges ;-) 👉For automated tutorials and QA'd code, see https://github.com/confluentinc/examples/

kubernetes - Production-Grade Container Scheduling and Management

CouchDB - Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability

prometheus - The Prometheus monitoring system and time series database.