|about 1 year ago||1 day ago|
|Apache License 2.0||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
We haven't tracked posts mentioning tape yet.
Tracking mentions began in Dec 2020.
Apache Hudi - The Streaming Data Lake Platform
8 projects | dev.to | 27 Jul 2021
Hudi is designed around the notion of base file and delta log files that store updates/deltas to a given base file (called a file slice). Their formats are pluggable, with Parquet (columnar access) and HFile (indexed access) being the supported base file formats today. The delta logs encode data in Avro (row oriented) format for speedier logging (just like Kafka topics for e.g). Going forward, we plan to inline any base file format into log blocks in the coming releases, providing columnar access to delta logs depending on block sizes. Future plans also include Orc base/log file formats, unstructured data formats (free form json, images), and even tiered storage layers in event-streaming systems/OLAP engines/warehouses, work with their native file formats.
Getting started with Kafka Connector for Azure Cosmos DB using Docker
6 projects | dev.to | 6 Jul 2021
So far we dealt with JSON, a commonly used data format. But, Avro is heavily used in production due to its compact format which leads to better performance and cost savings. To make it easier to deal with Avro data schema, there is Confluent Schema Registry which provides a serving layer for your metadata along with a RESTful interface for storing and retrieving your Avro (as well as JSON and Protobuf schemas). We will use the Docker version for the purposes of this blog post.
Tips for Designing Apache Kafka Message Payloads
3 projects | dev.to | 29 Apr 2021
Avro: Small and schema-driven Apache Avro is a serialisation system that keeps the data tidy and small, which is ideal for Kafka records. The data structure is described with a schema (example below) and messages can only be created if they conform with the requirements of the schema. The producer takes the data and the schema, produces a message that goes to the kafka broker, and registers the schema with a schema registry. The consumers do the same in reverse: take the message, ask the schema registry for the schema, and assemble the full data structure. Avro has a strong respect for data types, requires all payloads conform with the schema, and since data such as fieldnames is encoded in the schema rather than repeated in every payload, the overall payload size is reduced.
Scala 3.0 serialization
5 projects | reddit.com/r/scala | 30 Mar 2021
For binary serialization using Avro there's Vulcan which is released for 3.0.0-RC1 and will shortly be released for 3.0.0-RC2. (Disclosure: I'm a maintainer)
Looking for simple avro like serialization format
3 projects | reddit.com/r/rust | 22 Jan 2021
You can make use of the official C library with rust-bindgen and wrap what you need from there.
What are some alternatives?
Protobuf - Protocol Buffers - Google's data interchange format
SBE - Simple Binary Encoding (SBE) - High Performance Message Codec
Big Queue - A big, fast and persistent queue based on memory mapped file.
Apache Thrift - Apache Thrift
Apache Parquet - Apache Parquet
iceberg - Apache Iceberg
Apache Orc - Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
Persistent Collection - A Persistent Java Collections Library
Androl4b - A Virtual Machine For Assessing Android applications, Reverse Engineering and Malware Analysis
Wire - gRPC and protocol buffers for Android, Kotlin, and Java.