|Apache Thrift||Apache Avro|
|1 day ago||1 day ago|
|Apache License 2.0||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Ask HN: Who Wants to Collaborate?
58 projects | news.ycombinator.com | 1 Jan 2022
Deadline Budget Propagation for Baseplate.py
3 projects | reddit.com/r/RedditEng | 27 Sep 2021
Thus, we released Baseplate.py v2.1 with deadline propagation. Each request between Baseplate services has an associated THeader, which includes relevant information for Baseplate to fulfill its functionality, such as tracing request timings. We added a “Deadline-Budget” field to this header that propagates the remaining timeout so that information is available to the following request, and this timeout continues to get updated with every new request made. With this update, we save production costs by allowing resources to work on requests awaiting a response, and gain overall improved latency.
If someone ever asks you why you use Apollo, show them this screenshot.
1 project | reddit.com/r/apolloapp | 23 Sep 2021
Here’s an example of the Thrift changelog. Knock yourself out. Or you can get your sense of productivity by actually doing something of value.
parquet2 0.3.0, with native support to read async
3 projects | reddit.com/r/rust | 9 Aug 2021
The biggest addition is native async reading via futures::AsyncRead and futures::AsyncSeek, which required a lot of (to be merged) changes upstream (changes to thrift rust compiler and parquet-format-rs). I placed those changes on a temporary crate until things are released there.
proposal: expression to create pointer to simple types #45624
3 projects | reddit.com/r/golang | 18 Apr 2021
Can you share your experience with race conditions in production?
1 project | reddit.com/r/java | 25 Jan 2021
We were sharing instances of a Thrift TDeserializer across threads. We knew TProtocol was not thread-safe, but the TDeserializer constructor accepts a TProtocolFactory, so we naively assumed the deserialize method would use that to create a new instance of TProtocol for each invocation, but unfortunately, the TDeserializer constructor immediately creates TProtocol and stores it in a member variable, so TDeserializer is not actually thread-safe.
2 projects | dev.to | 18 Jan 2022
When serializing a value, we convert it to a different sequence of bytes. This sequence is often a human-readable string (all the bytes can be read and interpreted by humans as text), but not necessarily. The serialized format can be binary. Binary data (example: an image) is still bytes, but makes use of non-text characters, so it looks like gibberish in a text editor. Binary formats won't make sense unless deserialized by an appropriate program. An example of a human-readable serialization format is JSON. Examples of binary formats are Apache Avro, Protobuf.
Dreaming and Breaking Molds – Establishing Best Practices with Scott Haines
3 projects | dev.to | 8 Dec 2021
Scott: It's like a very large row of Avro data that had everything you could possibly ever need. It was like 115 columns. Most things were null, and it became every data type you'd ever want. It's like, is it mobile? Look for mobile_. It's like, this is really crappy. I didn't know about, I guess, the hardships of data engineering at that point. Because this was the first time where I was like, okay, you're on the ground basically pulling data now, and now we're going to do stuff with it. We're going to power our whole entire application with it. And I remember that just being exciting. The gears were turning. I was waking up super early. I wanted to go in to just to work on it more. It was the first thing where it's like, man, that's just like the coolest thing in the whole entire world.
Apache Hudi - The Streaming Data Lake Platform
8 projects | dev.to | 27 Jul 2021
Hudi is designed around the notion of base file and delta log files that store updates/deltas to a given base file (called a file slice). Their formats are pluggable, with Parquet (columnar access) and HFile (indexed access) being the supported base file formats today. The delta logs encode data in Avro (row oriented) format for speedier logging (just like Kafka topics for e.g). Going forward, we plan to inline any base file format into log blocks in the coming releases, providing columnar access to delta logs depending on block sizes. Future plans also include Orc base/log file formats, unstructured data formats (free form json, images), and even tiered storage layers in event-streaming systems/OLAP engines/warehouses, work with their native file formats.
Getting started with Kafka Connector for Azure Cosmos DB using Docker
6 projects | dev.to | 6 Jul 2021
So far we dealt with JSON, a commonly used data format. But, Avro is heavily used in production due to its compact format which leads to better performance and cost savings. To make it easier to deal with Avro data schema, there is Confluent Schema Registry which provides a serving layer for your metadata along with a RESTful interface for storing and retrieving your Avro (as well as JSON and Protobuf schemas). We will use the Docker version for the purposes of this blog post.
Tips for Designing Apache Kafka Message Payloads
3 projects | dev.to | 29 Apr 2021
Avro: Small and schema-driven Apache Avro is a serialisation system that keeps the data tidy and small, which is ideal for Kafka records. The data structure is described with a schema (example below) and messages can only be created if they conform with the requirements of the schema. The producer takes the data and the schema, produces a message that goes to the kafka broker, and registers the schema with a schema registry. The consumers do the same in reverse: take the message, ask the schema registry for the schema, and assemble the full data structure. Avro has a strong respect for data types, requires all payloads conform with the schema, and since data such as fieldnames is encoded in the schema rather than repeated in every payload, the overall payload size is reduced.
Scala 3.0 serialization
5 projects | reddit.com/r/scala | 30 Mar 2021
For binary serialization using Avro there's Vulcan which is released for 3.0.0-RC1 and will shortly be released for 3.0.0-RC2. (Disclosure: I'm a maintainer)
Looking for simple avro like serialization format
3 projects | reddit.com/r/rust | 22 Jan 2021
You can make use of the official C library with rust-bindgen and wrap what you need from there.
What are some alternatives?
gRPC - The C based gRPC (C++, Python, Ruby, Objective-C, PHP, C#)
Protobuf - Protocol Buffers - Google's data interchange format
SBE - Simple Binary Encoding (SBE) - High Performance Message Codec
ZeroMQ - ZeroMQ core engine in C++, implements ZMTP/3.1
Cap'n Proto - Cap'n Proto serialization/RPC system - core tools and C++ library
Apache Parquet - Apache Parquet
iceberg - Apache Iceberg
nanomsg - nanomsg library
Big Queue - A big, fast and persistent queue based on memory mapped file.
Apache Orc - Apache ORC - the smallest, fastest columnar storage for Hadoop workloads