Apache Orc
FlatBuffers
Apache Orc | FlatBuffers | |
---|---|---|
4 | 48 | |
654 | 22,062 | |
0.6% | 0.6% | |
9.5 | 8.7 | |
8 days ago | 7 days ago | |
Java | C++ | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Apache Orc
-
Java Serialization with Protocol Buffers
The information can be stored in a database or as files, serialized in a standard format and with a schema agreed with your Data Engineering team. Depending on your information and requirements, it can be as simple as CSV, XML or JSON, or Big Data formats such as Parquet, Avro, ORC, Arrow, or message serialization formats like Protocol Buffers, FlatBuffers, MessagePack, Thrift, or Cap'n Proto.
- Personal data of 120,000 Russian servicemen fighting in Ukraine made public
-
AWS EMR Cost Optimization Guide
Data formatting is another place to make gains. When dealing with huge amounts of data, finding the data you need can take up a significant amount of your compute time. Apache Parquet and Apache ORC are columnar data formats optimized for analytics that pre-aggregate metadata about columns. If your EMR queries column intensive data like sum, max, or count, you can see significant speed improvements by reformatting data like CSVs into one of these columnar formats.
-
Apache Hudi - The Streaming Data Lake Platform
The following stack captures layers of software components that make up Hudi, with each layer depending on and drawing strength from the layer below. Typically, data lake users write data out once using an open file format like Apache Parquet/ORC stored on top of extremely scalable cloud storage or distributed file systems. Hudi provides a self-managing data plane to ingest, transform and manage this data, in a way that unlocks incremental data processing on them.
FlatBuffers
- FlatBuffers – an efficient cross platform serialization library for many langs
-
Cap'n Proto 1.0
I don't work at Cloudflare but follow their work and occasionally work on performance sensitive projects.
If I had to guess, they looked at the landscape a bit like I do and regarded Cap'n Proto, flatbuffers, SBE, etc. as being in one category apart from other data formats like Avro, protobuf, and the like.
So once you're committed to record'ish shaped (rather than columnar like Parquet) data that has an upfront parse time of zero (nominally, there could be marshalling if you transmogrify the field values on read), the list gets pretty short.
https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-... goes into some of the trade-offs here.
Cap'n Proto was originally made for https://sandstorm.io/. That work (which Kenton has presumably done at Cloudflare since he's been employed there) eventually turned into Cloudflare workers.
Another consideration: https://github.com/google/flatbuffers/issues/2#issuecomment-...
-
Anyone has experience with reverse engineering flatbuffers?
Much more in the discussion of this particular issue onGitHub: flatbuffers:Reverse engineering #4258
-
Flatty - flat message buffers with direct mapping to Rust types without packing/unpacking
Related but not Rust-specific: FlatBuffers, Cap'n Proto.
- flatbuffers - FlatBuffers: Memory Efficient Serialization Library
-
How do AAA studios make update-compatible save systems?
If json files are a concern because of space, you can always look into something like protobuffers or flatbuffers. But whatever you use, you should try to find a solution where you don't have to think about the actual serialization/deserialization of your objects, and can just concentrate on the data.
- QuickBuffers 1.1 released
-
Choosing a protocol for communication between multiple microcontrollers
Or, as an alternative to protobuffers, there's also flatbuffers, which is lighter weight and needs less memory: https://google.github.io/flatbuffers/
- FlatBuffers: FlatBuffers
-
Is using Flatbuffers to parse sensor data a bad application of Flatbuffers?
As the title suggests, I am considering using Flatbuffers as a way to parse sensor data that has been stored in local datafiles. The project language is python.
What are some alternatives?
Protobuf - Protocol Buffers - Google's data interchange format
Apache Parquet - Apache Parquet
MessagePack - MessagePack implementation for C and C++ / msgpack.org[C/C++]
Apache Avro - Apache Avro is a data serialization system.
MessagePack - MessagePack serializer implementation for Java / msgpack.org[Java]
hudi - Upserts, Deletes And Incremental Processing on Big Data.
Cap'n Proto - Cap'n Proto serialization/RPC system - core tools and C++ library
Apache Thrift - Apache Thrift
cereal - A C++11 library for serialization
tape - A lightning fast, transactional, file-based FIFO for Android and Java.
Kryo - Java binary serialization and cloning: fast, efficient, automatic