Apache Kafka
Apache Spark
Our great sponsors
Apache Kafka | Apache Spark | |
---|---|---|
25 | 101 | |
27,066 | 38,104 | |
1.4% | 1.1% | |
9.9 | 10.0 | |
7 days ago | 5 days ago | |
Java | Scala | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Apache Kafka
- Scala DevInTraining looking to contribute to projects
- *bip*
-
What is Kafka ?
Source and documentation on GitHub
-
Can someone please eli5 how the hierarchical timing wheel algorithm works?
I briefly described the algorithm in this article and there is a wonderful article from Kafka that goes into more depth in their general purpose implementation. My implementation is specialized and over optimized in comparison, e.g. by using bit manipulation to avoid more expensive division/modulus instructions. Tokio rewrote their timerwheel after I showed them mine, borrowing some ideas but also staying more general. Hope that helps!
-
How-to-Guide: Contributing to Open Source
Apache Kafka
-
I am proud to announce, a new Sorting algorithm!
AFAIK, the Linux kernel actually uses a LinkedList for this (Ref: workqueue.c, types.h) and message queues use Timing Wheel (Ref: Kafka's TimingWheel)
- Project Ideas Thread
-
Which diagram tool Kafka using in its documentation
Looks like the author for that image is Guozhang Wang, who is still active in the kafka repo.
-
How to get `byte[]` as `byte[]` in a Kafka Record (in an SMT)
Perhaps you are looking for org.apache.kafka.connect.converters.ByteArrayConverter?
-
Open Source Analytics Stack: Bringing Control, Flexibility, and Data-Privacy to Your Analytics
With the increase in real-time data streams and event streams, certain use cases emerged that require access to real-time data such as financial services risk reporting or detecting a credit card fraud. Real-time streams can be obtained using a stream processing framework like Apache Kafka (website, GitHub). The focus is to direct the stream of data from various sources into reliable queues where data can be automatically transformed, stored, analyzed and reported concurrently.
Apache Spark
- "xAI will open source Grok"
-
Groovy 🎷 Cheat Sheet - 01 Say "Hello" from Groovy
Recently I had to revisit the "JVM languages universe" again. Yes, language(s), plural! Java isn't the only language that uses the JVM. I previously used Scala, which is a JVM language, to use Apache Spark for Data Engineering workloads, but this is for another post 😉.
-
🦿🛴Smarcity garbage reporting automation w/ ollama
Consume data into third party software (then let Open Search or Apache Spark or Apache Pinot) for analysis/datascience, GIS systems (so you can put reports on a map) or any ticket management system
-
Go concurrency simplified. Part 4: Post office as a data pipeline
also, this knowledge applies to learning more about data engineering, as this field of software engineering relies heavily on the event-driven approach via tools like Spark, Flink, Kafka, etc.
-
Five Apache projects you probably didn't know about
Apache SeaTunnel is a data integration platform that offers the three pillars of data pipelines: sources, transforms, and sinks. It offers an abstract API over three possible engines: the Zeta engine from SeaTunnel or a wrapper around Apache Spark or Apache Flink. Be careful, as each engine comes with its own set of features.
-
Apache Spark VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
-
Integrate Pyspark Structured Streaming with confluent-kafka
Apache Spark - https://spark.apache.org/
- Rest in Peas: The Unrecognized Death of Speech Recognition (2010)
-
Gotta write this on my resume
So for example contributing to say spark may better for experience(and resume) than Twitter-the algorithm.
-
Query Real Time Data in Kafka Using SQL
Additionally, one of the challenges of working with Kafka is how to efficiently analyze and extract insights from the large volumes of data stored in Kafka topics. Traditional batch processing approaches, such as Hadoop MapReduce or Apache Spark, can be slow and expensive, and may not be suitable for real-time analytics. To address this challenge, you can use SQL queries with Kafka to analyze and extract insights from the data in real time.
What are some alternatives?
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
celery - Distributed Task Queue (development branch)
Apache ActiveMQ Artemis - Mirror of Apache ActiveMQ Artemis
redpanda - Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
jetstream - JetStream Utilities
Aeron - Efficient reliable UDP unicast, UDP multicast, and IPC message transport
NATS - High-Performance server for NATS.io, the cloud and edge native messaging system.
Apache Qpid - Mirror of Apache Qpid
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
Hermes - Fast and reliable message broker built on top of Kafka.
JBoss HornetQ - HornetQ is an open source project to build a multi-protocol, embeddable, very high performance, clustered, asynchronous messaging system.
Chronicle Queue - Micro second messaging that stores everything to disk