Apache Avro
Protobuf
Our great sponsors
Apache Avro | Protobuf | |
---|---|---|
22 | 171 | |
2,736 | 63,263 | |
1.6% | 0.9% | |
9.7 | 10.0 | |
8 days ago | 6 days ago | |
Java | C++ | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Apache Avro
-
Generating Avro Schemas from Go types
The most common format for describing schema in this scenario is Apache Avro.
- The state of Apache Avro in Rust
- How people generate examples for multiple programming languages?
-
gRPC on the client side
Other serialization alternatives have a schema validation option: e.g., Avro, Kryo and Protocol Buffers. Interestingly enough, gRPC uses Protobuf to offer RPC across distributed components:
-
Understanding Azure Event Hubs Capture
Apache Avro is a data serialization system, for more information visit Apache Avro
-
In One Minute : Hadoop
Avro, a data serialization system based on JSON schemas.
- Protocol Buffer x JSON para serialização de dados
-
Marshaling objects in modern Java
If binary format is OK, use Protocol Buffer or Avro . Note that in the case of binary formats, you need a schema to serialize/de-serialize your data. Therefore, you'd probably want a schema registry to store all past and present schemas for later usage.
-
How-to-Guide: Contributing to Open Source
Apache Avro
-
How should I handle storing and reading from large amounts of data in my project?
Maybe it will be simpler to serialise all the data in a more compact data format, such as avro (its readme is in here), a row based format that seems to be able to use zstd/bzip/xz.
Protobuf
-
Reverse Engineering Protobuf Definitions from Compiled Binaries
For at least 4 years protobuf has had decent support for self-describing messages (very similar to avro) as well as reflection
https://github.com/protocolbuffers/protobuf/blob/main/src/go...
Xgooglers trying to make do on the cheap will just create a Union of all their messages and include the message def in a self-describing message pattern. Super-sensitive network I/O can elide the message def (empty buffer) and any for RecordIO clone well file compression takes care of the definition.
Definitely useful to be able to dig out old defs but protobuf maintainers have surprisingly added useful features so you don’t have to.
Bonus points tho for extracting the protobuf defs that e.g. Apple bakes into their binaries.
- Show HN: AuthWin – Authenticator App for Windows
-
Create Production-Ready SDKs With gRPC Gateway
gRPC Gateway is a protoc plugin that reads gRPC service definitions and generates a reverse proxy server that translates a RESTful JSON API into gRPC.
-
Create Production-Ready SDKs with Goa
To use more recent versions of protoc in future applications, you can download them from the Protobuf repository.
-
Roll your own auth with Rust and Protobuf
Use the Protobuf CLI protoc and the plugin protoc-gen-tonic.
-
Add extra stuff to a “standard” encoding? Sure, why not
> didn’t find any standard for separating protobuf messages
The fact that protobufs are not self-delimiting is an endless source of frustration, but I know of 2 standards:
- SerializeDelimited* is part of the protobuf library: https://github.com/protocolbuffers/protobuf/blob/main/src/go...
- Riegeli is "a file format for storing a sequence of string records, typically serialized protocol buffers. It supports dense compression, fast decoding, seeking, detection and optional skipping of data corruption, filtering of proto message fields for even faster decoding, and parallel encoding": https://github.com/google/riegeli
I actually went through all projects listed in [1] because I remember this very quirk. It turns out that there are many such libraries that have two variants of encode/decode functions, where the second variant prepends a varint length. In my brief inspection there do exist a few libraries with only the second variant (e.g. Rust quick-protobuf), which is legitimately problematic [2].
But if the project in question was indeed protobuf.js (see loeg's comments), it clearly distinguishes encode/decode vs. encodeDelimited/decodeDelimited. So I believe the project should not be blamed, and the better question would be why so many people chose to add this exact helper. Well, because Google itself also had the same helper [3]! So at this point protobuf should just standardize this simple framing format (with an explicitly different name though), instead of claiming that protobuf has no obligation to define one.
[1] https://github.com/protocolbuffers/protobuf/blob/main/docs/t...
[2] https://github.com/tafia/quick-protobuf/issues/130
[3] https://protobuf.dev/reference/java/api-docs/com/google/prot...
[4] https://github.com/protocolbuffers/protobuf/blob/main/src/go...
-
Block YouTube Ads on AppleTV by Decrypting and Stripping Ads from Profobuf
It looks like it is in fact universal. Just glancing at the code here, it looks like the tool searches any arbitrary file for bytes that look like encoded protobuf descriptors, specifically looking for bytes that are plausibly the beginning of a FileDescriptorProto message defined here:
https://github.com/protocolbuffers/protobuf/blob/main/src/go...
This takes advantage of the fact that such descriptors are commonly compiled into programs that use protobuf. The descriptors are usually embedded as constant byte arrays. That said, not all protobuf implementations embed the descriptors and those that do often have an option to inhibit such embedding (at the expense of losing some dynamic introspection features).
-
How Turborepo is porting from Go to Rust
On optional.. this was a regression in proto that is somewhat helped by https://github.com/protocolbuffers/protobuf/blob/main/docs/f... ; I have no idea whether protobuf for rust has started taking advantage of this.
JSON is awful in every way.
recent versions of proto3 have added back the “optional” keyword that can be used on any field. see: https://github.com/protocolbuffers/protobuf/blob/main/docs/f...
What are some alternatives?
FlatBuffers - FlatBuffers: Memory Efficient Serialization Library
SBE - Simple Binary Encoding (SBE) - High Performance Message Codec
MessagePack - MessagePack implementation for C and C++ / msgpack.org[C/C++]
cereal - A C++11 library for serialization
Apache Parquet - Apache Parquet
Bond - Bond is a cross-platform framework for working with schematized data. It supports cross-language de/serialization and powerful generic mechanisms for efficiently manipulating data. Bond is broadly used at Microsoft in high scale services.
Protobuf.NET - Protocol Buffers library for idiomatic .NET
Boost.Serialization - Boost.org serialization module
Cap'n Proto - Cap'n Proto serialization/RPC system - core tools and C++ library
protostuff - Java serialization library, proto compiler, code generator
MessagePack for C# (.NET, .NET Core, Unity, Xamarin) - Extremely Fast MessagePack Serializer for C#(.NET, .NET Core, Unity, Xamarin). / msgpack.org[C#]