Apache Orc vs Apache Thrift

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Apache Orc		Apache Thrift
	Project
4	Mentions	10
654	Stars	10,143
0.9%	Growth	0.6%
9.4	Activity	9.0
4 days ago	Latest Commit	3 days ago
Java	Language	C++
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Apache Orc

Posts with mentions or reviews of Apache Orc. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-11-01.

Java Serialization with Protocol Buffers
6 projects | dev.to | 1 Nov 2022

The information can be stored in a database or as files, serialized in a standard format and with a schema agreed with your Data Engineering team. Depending on your information and requirements, it can be as simple as CSV, XML or JSON, or Big Data formats such as Parquet, Avro, ORC, Arrow, or message serialization formats like Protocol Buffers, FlatBuffers, MessagePack, Thrift, or Cap'n Proto.
Personal data of 120,000 Russian servicemen fighting in Ukraine made public
2 projects | /r/worldnews | 1 Mar 2022
AWS EMR Cost Optimization Guide
1 project | dev.to | 14 Dec 2021

Data formatting is another place to make gains. When dealing with huge amounts of data, finding the data you need can take up a significant amount of your compute time. Apache Parquet and Apache ORC are columnar data formats optimized for analytics that pre-aggregate metadata about columns. If your EMR queries column intensive data like sum, max, or count, you can see significant speed improvements by reformatting data like CSVs into one of these columnar formats.
Apache Hudi - The Streaming Data Lake Platform
8 projects | dev.to | 27 Jul 2021

The following stack captures layers of software components that make up Hudi, with each layer depending on and drawing strength from the layer below. Typically, data lake users write data out once using an open file format like Apache Parquet/ORC stored on top of extremely scalable cloud storage or distributed file systems. Hudi provides a self-managing data plane to ingest, transform and manage this data, in a way that unlocks incremental data processing on them.

Apache Thrift

Posts with mentions or reviews of Apache Thrift. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-08-20.

Symfony in microservice architecture - Episode I : Symfony and Golang communication through gRPC
7 projects | dev.to | 20 Aug 2022

There are various notable implementations of RPC like Apache Thrift and gRPC.
What is gRPC popularity? I believe not very popular. And subreddit is small. Why is that?
2 projects | /r/grpc | 26 Jul 2022
Fresh – The next-gen web framework
21 projects | news.ycombinator.com | 12 Jun 2022

> That's just your choice of how to build your app, right? You could've avoided this by rendering templates on the server and sending static HTML to the client, keeping the business logic on the server.
No, that's a requirement on most business cases, my comment stated 'complex and dynamic web apps'. Re-rendering the whole page everytime the user checks a box or clicks a button is (a) terrible UX, (b) hard to track the state between page refresh, (c) wrong practice and (d) bad performance.
> Here's just one of ten-thousand other battle-tested options you can use: https://github.com/apache/thrift/
Sure, I should setup a complex and huge dependency for just one of the many problems I highlighted. What a great idea
Ask HN: Who Wants to Collaborate?
58 projects | news.ycombinator.com | 1 Jan 2022
Deadline Budget Propagation for Baseplate.py
3 projects | /r/RedditEng | 27 Sep 2021

Thus, we released Baseplate.py v2.1 with deadline propagation. Each request between Baseplate services has an associated THeader, which includes relevant information for Baseplate to fulfill its functionality, such as tracing request timings. We added a “Deadline-Budget” field to this header that propagates the remaining timeout so that information is available to the following request, and this timeout continues to get updated with every new request made. With this update, we save production costs by allowing resources to work on requests awaiting a response, and gain overall improved latency.
If someone ever asks you why you use Apollo, show them this screenshot.
1 project | /r/apolloapp | 23 Sep 2021

Here’s an example of the Thrift changelog. Knock yourself out. Or you can get your sense of productivity by actually doing something of value.
parquet2 0.3.0, with native support to read async
3 projects | /r/rust | 9 Aug 2021

The biggest addition is native async reading via futures::AsyncRead and futures::AsyncSeek, which required a lot of (to be merged) changes upstream (changes to thrift rust compiler and parquet-format-rs). I placed those changes on a temporary crate until things are released there.
proposal: expression to create pointer to simple types #45624
3 projects | /r/golang | 18 Apr 2021
Can you share your experience with race conditions in production?
1 project | /r/java | 25 Jan 2021

We were sharing instances of a Thrift TDeserializer across threads. We knew TProtocol was not thread-safe, but the TDeserializer constructor accepts a TProtocolFactory, so we naively assumed the deserialize method would use that to create a new instance of TProtocol for each invocation, but unfortunately, the TDeserializer constructor immediately creates TProtocol and stores it in a member variable, so TDeserializer is not actually thread-safe.

What are some alternatives?

When comparing Apache Orc and Apache Thrift you can also consider the following projects:

Protobuf - Protocol Buffers - Google's data interchange format

gRPC - The C based gRPC (C++, Python, Ruby, Objective-C, PHP, C#)

Apache Parquet - Apache Parquet

ZeroMQ - ZeroMQ core engine in C++, implements ZMTP/3.1

Apache Avro - Apache Avro is a data serialization system.

Cap'n Proto - Cap'n Proto serialization/RPC system - core tools and C++ library

hudi - Upserts, Deletes And Incremental Processing on Big Data.

tape - A lightning fast, transactional, file-based FIFO for Android and Java.

debezium - Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

Apache Orc vs Protobuf Apache Thrift vs gRPC Apache Orc vs Apache Parquet Apache Thrift vs ZeroMQ Apache Orc vs Apache Avro Apache Thrift vs Cap'n Proto Apache Orc vs hudi Apache Thrift vs Protobuf Apache Orc vs tape Apache Thrift vs Apache Avro Apache Orc vs debezium Apache Thrift vs Apache Parquet

Compare Apache Orc vs Apache Thrift and see what are their differences.

Apache Orc

Apache Thrift

Apache Orc

Apache Thrift

What are some alternatives?