Apache Avro
hudi
Our great sponsors
Apache Avro | hudi | |
---|---|---|
22 | 20 | |
2,764 | 5,053 | |
1.4% | 2.0% | |
9.7 | 9.9 | |
about 9 hours ago | 7 days ago | |
Java | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Apache Avro
-
Open Table Formats Such as Apache Iceberg Are Inevitable for Analytical Data
Apache AVRO [1] is one but it has been largely replaced by Parquet [2] which is a hybrid row/columnar format
[1] https://avro.apache.org/
-
Generating Avro Schemas from Go types
The most common format for describing schema in this scenario is Apache Avro.
-
How do you update an existing avro schema using apache avro SchemaBuilder?
I am testing a new schema registry which loads and retrieves different kinds of avro schemas. In the process of testing, I need to create a bunch of different types of avro schemas. As it involves a lot of permutations, I decided to create the schema programmatically.I am using the apache avro SchemaBuilder to do so.
- The state of Apache Avro in Rust
- How people generate examples for multiple programming languages?
-
gRPC on the client side
Other serialization alternatives have a schema validation option: e.g., Avro, Kryo and Protocol Buffers. Interestingly enough, gRPC uses Protobuf to offer RPC across distributed components:
-
Understanding Azure Event Hubs Capture
Apache Avro is a data serialization system, for more information visit Apache Avro
-
tl;dr of Data Contracts
Once things like JSON became more popular Apache Avro appeared. You can define Avro files which can then be generated into Python, Java C, Ruby, etc.. classes.
-
In One Minute : Hadoop
Avro, a data serialization system based on JSON schemas.
-
Events: Fat or Thin?
Supporting multiple versions of an event schema is a solved problem. Apache Avro with a published schema hash in a message header is one solution.
https://avro.apache.org/
hudi
-
Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog
Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake.
-
The "Big Three's" Data Storage Offerings
Structured, Semi-structured and Unstructured can be stored in one single format, a lakehouse storage format like Delta, Iceberg or Hudi (assuming those don't require low-latency SLAs like subsecond).
-
Data-eng related highlights from the latest Thoughtworks Tech Radar
Apache Hudi
- For those of you with Lakehouse Architectures, how do you handle duplicate records?
-
AWS ACID data lakehouse
Try Apache Hudi, it is fully integrated with AWS and offers almost everything that you requested.
-
Data n00b looking for guidance on how to setup data lake/warehouse
the corresponding kafka topics have 30d retention and I intend on having s3 sink connector for long term storage (open to other ideas here too, I noticed theres a hudi connector also)
- apache/hudi: Upserts, Deletes And Incremental Processing on Big Data.
- Big Data file formats
-
How-to-Guide: Contributing to Open Source
Apache Hudi
-
What do you use for Data versioning?
You could have a look at Apache Hudi - especially if you're running your Data Pipelines using Spark or Flink.
What are some alternatives?
Protobuf - Protocol Buffers - Google's data interchange format
iceberg - Apache Iceberg
SBE - Simple Binary Encoding (SBE) - High Performance Message Codec
kudu - Mirror of Apache Kudu
Apache Thrift - Apache Thrift
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
debezium - Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
Apache Parquet - Apache Parquet
pinot - Apache Pinot - A realtime distributed OLAP datastore
gRPC - The C based gRPC (C++, Python, Ruby, Objective-C, PHP, C#)
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs