Rust Kafka

Open-source Rust projects categorized as Kafka

Top 11 Rust Kafka Projects

  • materialize

    Materialize is a fast, distributed SQL database built on streaming internals. (by MaterializeInc)

    Project mention: What makes a time series oriented database (ex: QuestDB) more efficient for OLAP on time series than an OLAP "only" oriented database (ex: DuckDB) technically? | reddit.com/r/dataengineering | 2023-01-23

    AFAIK there is a lot of overlap between OLAP databases and time series databases. Timescale](https://legacy-docs.timescale.com/v1.7/introduction/architecture) gains a lot of its performance via the "Hypertable" abstraction which is fairly similar to something like Parquet partitioning/bucketing. In terms of performance I don't know if there is a huge gap either for non optimized use cases. The [Clickhouse] team for example feels confident that Clickhouse can be used as a time series database. There are also [independent benchmarks showing the performance is comparable[(https://pradeepchhetri.xyz/clickhousevstimescaledb/). I think where time series specific databases excel are in their tooling for time series specific queries. Things like continuous aggregates or efficient gap filling. But non time series databases are catching up on that front. Clickhouse has live views and Materialize is also playing in that space

  • Ockam

    Orchestrate end-to-end encryption, mutual authentication, key management, credential management & authorization policy enforcement — at scale.

    Project mention: Hiring - Ockam (Series A SaaS) | reddit.com/r/devopsjobs | 2023-01-12
  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • rust-rdkafka

    A fully asynchronous, futures-based Kafka client library for Rust based on librdkafka

    Project mention: A Rust client library for interacting with Microsoft Airsim https://github.com/Sollimann/airsim-client | reddit.com/r/robotics | 2023-01-22

    kafka

  • kafka-rust

    Rust client for Apache Kafka

    Project mention: Version 0.9.0 of the u/rustlang Kafka client library has been released | reddit.com/r/rust | 2022-05-02

    Project: https://github.com/kafka-rust/kafka-rust

  • matano

    Open source cloud-native security lake platform (SIEM alternative) for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS 🦀

    Project mention: Launch HN: Matano (YC W23) – Open-Source Security Lake Platform (SIEM) for AWS | news.ycombinator.com | 2023-01-24

    Hi HN! We’re Shaeq and Samrose, co-founders of Matano (https://matano.dev). Matano is a high-scale, low-cost alternative to traditional SIEM (e.g. Splunk, Elastic) built around a vendor-agnostic security data lake that deploys to your AWS account.

    Don’t worry — we’ll explain all this jargon in a second.

    SIEM stands for “Security Information and Event Management” and refers to log management tools used by security teams to detect threats from an organization's security logs (network, host, cloud, SaaS audit logs, etc.) and send alerts about them. Security engineers write detection rules inside the SIEM as queries to detect suspicious activity and create alerts. For example, a security engineer could write a detection rule that checks the fields in each CloudTrail log and creates an alert whenever an S3 bucket is modified with public access, to prevent data exfiltration.

    Traditional SIEM tools (e.g. Splunk, Elastic) used to analyze security data are difficult to manage for security teams on the cloud. Most don’t scale because they are built on top of a NoSQL database or search engine like Elasticsearch. And they are expensive — the enterprise SIEM vendors have costly ingest-based licenses. Since security data from SaaS and cloud environments can exceed hundreds of terabytes, teams are left with unsatisfactory options: either not collect some data, leave some data unprocessed, pay exorbitant fees to an enterprise vendor, or build their own large-scale solution for data storage (aka “data lake”).

    Companies like Apple, HSBC, and Brex take the latter approach: they build their own security data lakes to analyze their security data without breaking the bank. “Data lake” is jargon for heterogeneous data that is too large to be kept in a standard database and is analyzed directly from object storage like S3. A “security data lake” is a repository of security logs parsed and normalized into a common structure and stored in object storage for cost-effective analysis. Building your own data lake is a fine option if you’re big enough to justify the cost — but most companies can’t afford it.

    Then there’s the vendor lock-in issue. SIEM vendors store data in proprietary formats that make it difficult to use outside of their ecosystem. Even with "next-gen" products that leverage data lake technology, it's nearly impossible to swap out your data analytics stack or migrate your security data to another tool because of a tight coupling of systems designed to keep you locked in.

    Security programs also suffer because of poor data quality. Most SIEMs today are built as search engines or databases that query unstructured/semi-structured logs. This requires you to heavily index data upfront which is inefficient, expensive and makes it hard to analyze months of data. Writing detection rules requires analysts to use vendor-specific DSLs that lack the flexibility to model complex attacker behaviors. Without structured and normalized data, it is difficult to correlate across data sources and build effective rules that don’t create many false positive alerts.

    While the cybersecurity industry has been stuck dealing with these legacy architectures, the data analytics industry has seen a ton of innovation through open-source initiatives such as Apache Iceberg, Parquet, and Arrow, delivering massive cost savings and performance breakthroughs.

    We encountered this problem when building out petabyte-scale data platforms at Amazon and Duo Security. We realized that most security teams don't have the resources to build a security data lake in-house or take advantage of modern analytics tools, so they’re stuck with legacy SIEM tools that predate the cloud.

    We quit our jobs at AWS and started Matano to close the gap between these two worlds by building an OSS platform that helps security teams leverage the modern data stack (e.g. Spark, Athena, Snowflake) and efficiently analyze security data from all the disparate sources across an organization.

    Matano lets you ingest petabytes of security and log data from various sources, store and query them in an open data lake, and create Python detections as code for realtime alerting.

    Matano works by normalizing unstructured security logs into a structured realtime data lake in your AWS account. All data is stored in optimized Parquet files in S3 object storage for cost-effective retention and analysis at petabyte scale. To prevent vendor lock-in, Matano uses Apache Iceberg, a new open table format that lets you bring your own analytics stack (Athena, Snowflake, Spark, etc.) and query your data from different tools without having to copy any data. By normalizing fields according to the Elastic Common Schema (ECS), we help you easily search for indicators across your data lake, pivot on common fields, and write detection rules that are agnostic to vendor formats.

    We support native integrations to pull security logs from popular SaaS, Cloud, Host, and Network sources and custom JSON/CSV/Text log sources. Matano includes a built-in log transformation pipeline that lets you easily parse and transform logs at ingest time using Vector Remap Language (VRL) without needing additional tools (e.g. Logstash, Cribl).

    Matano uses a detection-as-code approach which lets you use Python to implement realtime alerting on your log data, and lets you use standard dev practices by managing rules in Git (test, code review, audit). Advanced detections that correlate across events and alerts can be written using SQL and executed on a scheduled basis.

    We built Matano to be fully serverless using technologies like Lambda, S3, and SQS for elastic horizontal scaling. We use Rust and Apache Arrow for high performance. Matano works well with your existing data stack, allowing you to plug in tools like Tableau, Grafana, Metabase, or Quicksight for visualization and use query engines like Snowflake, Athena, or Trino for analysis.

    Matano is free and open source software licensed under the Apache-2.0 license. Our use of open table and common schema standards gives you full ownership of your security data in a vendor neutral format. We plan on monetizing by offering a cloud product that includes enterprise and collaborative features to be able to use Matano as a complete replacement to SIEM.

    If you're interested to learn more, check out our docs (https://matano.dev/docs), GitHub repository (https://github.com/matanolabs/matano), or visit our website (https://matano.dev).

    We’d love to hear about your experiences with SIEM, security data tooling, and anything you’d like to share!

  • flowgger

    A fast data collector in Rust

  • kafka-delta-ingest

    A highly efficient daemon for streaming data from Kafka into Delta Lake

    Project mention: Which lakehouse table format do you expect your organization will be using by the end of 2023? | reddit.com/r/dataengineering | 2022-12-25

    This independence from a catalog allows for path based reads and writes. This is handy when writing from Kafka directly to Delta Lake for the first layer of ingestion. You don’t need a catalog (or even Spark). https://github.com/delta-io/kafka-delta-ingest/tree/main/src

  • SonarLint

    Clean code begins in your IDE with SonarLint. Up your coding game and discover issues early. SonarLint is a free plugin that helps you find & fix bugs and security issues from the moment you start writing code. Install from your favorite IDE marketplace today.

  • pq

    a command-line Protobuf parser with Kafka support and JSON output (by sevagh)

    Project mention: Aleka: a schema agnostic protobuf decoder | reddit.com/r/rust | 2022-06-04

    Reminds me of a tool an ex-coworker of mine wrote about 5 years ago. Check it out for inspiration maybe: https://github.com/sevagh/pq

  • graphql-rust-demo

    GraphQL Rust Demo

    Project mention: How to pass header (i.e. authentication) information to Juniper GraphQL Query or Mutation | reddit.com/r/rust | 2022-07-12

    I found an example with async-graphql but I can't find an equivalent example for Juniper. I am using actix_web, but an example with any server would probably work just fine.

  • shotover-proxy

    L7 data-layer proxy

    Project mention: Shotover Proxy for Cassandra and Redis, Written in Rust | news.ycombinator.com | 2022-08-09
  • rust-kafka-101

    Getting started with Rust and Kafka

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-01-24.

Rust Kafka related posts

Index

What are some of the best open-source Kafka projects in Rust? This list will help you:

Project Stars
1 materialize 4,780
2 Ockam 2,845
3 rust-rdkafka 1,117
4 kafka-rust 915
5 matano 788
6 flowgger 741
7 kafka-delta-ingest 175
8 pq 150
9 graphql-rust-demo 149
10 shotover-proxy 64
11 rust-kafka-101 2
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com