Apache Thrift
Apache Orc
Our great sponsors
Apache Thrift | Apache Orc | |
---|---|---|
10 | 4 | |
10,143 | 654 | |
0.6% | 0.9% | |
9.0 | 9.4 | |
about 22 hours ago | 1 day ago | |
C++ | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Apache Thrift
-
Symfony in microservice architecture - Episode I : Symfony and Golang communication through gRPC
There are various notable implementations of RPC like Apache Thrift and gRPC.
- What is gRPC popularity? I believe not very popular. And subreddit is small. Why is that?
-
Fresh – The next-gen web framework
> That's just your choice of how to build your app, right? You could've avoided this by rendering templates on the server and sending static HTML to the client, keeping the business logic on the server.
No, that's a requirement on most business cases, my comment stated 'complex and dynamic web apps'. Re-rendering the whole page everytime the user checks a box or clicks a button is (a) terrible UX, (b) hard to track the state between page refresh, (c) wrong practice and (d) bad performance.
> Here's just one of ten-thousand other battle-tested options you can use: https://github.com/apache/thrift/
Sure, I should setup a complex and huge dependency for just one of the many problems I highlighted. What a great idea
- Ask HN: Who Wants to Collaborate?
-
Deadline Budget Propagation for Baseplate.py
Thus, we released Baseplate.py v2.1 with deadline propagation. Each request between Baseplate services has an associated THeader, which includes relevant information for Baseplate to fulfill its functionality, such as tracing request timings. We added a “Deadline-Budget” field to this header that propagates the remaining timeout so that information is available to the following request, and this timeout continues to get updated with every new request made. With this update, we save production costs by allowing resources to work on requests awaiting a response, and gain overall improved latency.
-
If someone ever asks you why you use Apollo, show them this screenshot.
Here’s an example of the Thrift changelog. Knock yourself out. Or you can get your sense of productivity by actually doing something of value.
-
parquet2 0.3.0, with native support to read async
The biggest addition is native async reading via futures::AsyncRead and futures::AsyncSeek, which required a lot of (to be merged) changes upstream (changes to thrift rust compiler and parquet-format-rs). I placed those changes on a temporary crate until things are released there.
- proposal: expression to create pointer to simple types #45624
-
Can you share your experience with race conditions in production?
We were sharing instances of a Thrift TDeserializer across threads. We knew TProtocol was not thread-safe, but the TDeserializer constructor accepts a TProtocolFactory, so we naively assumed the deserialize method would use that to create a new instance of TProtocol for each invocation, but unfortunately, the TDeserializer constructor immediately creates TProtocol and stores it in a member variable, so TDeserializer is not actually thread-safe.
Apache Orc
-
Java Serialization with Protocol Buffers
The information can be stored in a database or as files, serialized in a standard format and with a schema agreed with your Data Engineering team. Depending on your information and requirements, it can be as simple as CSV, XML or JSON, or Big Data formats such as Parquet, Avro, ORC, Arrow, or message serialization formats like Protocol Buffers, FlatBuffers, MessagePack, Thrift, or Cap'n Proto.
- Personal data of 120,000 Russian servicemen fighting in Ukraine made public
-
AWS EMR Cost Optimization Guide
Data formatting is another place to make gains. When dealing with huge amounts of data, finding the data you need can take up a significant amount of your compute time. Apache Parquet and Apache ORC are columnar data formats optimized for analytics that pre-aggregate metadata about columns. If your EMR queries column intensive data like sum, max, or count, you can see significant speed improvements by reformatting data like CSVs into one of these columnar formats.
-
Apache Hudi - The Streaming Data Lake Platform
The following stack captures layers of software components that make up Hudi, with each layer depending on and drawing strength from the layer below. Typically, data lake users write data out once using an open file format like Apache Parquet/ORC stored on top of extremely scalable cloud storage or distributed file systems. Hudi provides a self-managing data plane to ingest, transform and manage this data, in a way that unlocks incremental data processing on them.
What are some alternatives?
gRPC - The C based gRPC (C++, Python, Ruby, Objective-C, PHP, C#)
Protobuf - Protocol Buffers - Google's data interchange format
ZeroMQ - ZeroMQ core engine in C++, implements ZMTP/3.1
Apache Parquet - Apache Parquet
Cap'n Proto - Cap'n Proto serialization/RPC system - core tools and C++ library
Apache Avro - Apache Avro is a data serialization system.
hudi - Upserts, Deletes And Incremental Processing on Big Data.
tape - A lightning fast, transactional, file-based FIFO for Android and Java.
debezium - Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.