Apache Impala Alternatives

Similar projects and alternatives to Apache Impala

bitcoinbook

347 22,525 9.9 HTML Apache Impala VS bitcoinbook

Mastering Bitcoin 3rd Edition - Programming the Open Blockchain
simdjson

63 18,337 9.2 C++ Apache Impala VS simdjson

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Apache Hive

14 5,320 9.6 Java Apache Impala VS Apache Hive

Apache Hive
seed_rl

8 760 0.0 Python Apache Impala VS seed_rl

Discontinued SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference. Implements IMPALA and R2D2 algorithms in TF2 with SEED's architecture.
ibis

22 4,041 10.0 Python Apache Impala VS ibis

the portable Python dataframe library
bloomfilter

1 11 0.0 Java Apache Impala VS bloomfilter

BloomFilter implementation in Java that uses Murmur3 for fast hashing (by prasanthj)
machin

2 381 1.8 Python Apache Impala VS machin

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better Apache Impala alternative or higher similarity.

Suggest an alternative to Apache Impala

Apache Impala reviews and mentions

Posts with mentions or reviews of Apache Impala. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-10-03.

Word-Aligned Bloom Filters
5 projects | news.ycombinator.com | 3 Oct 2021

> whether this would really work out in most workloads
> just because it keeps the cache-lines hotter and less likely to be evicted.
Okay, so keeping cache for a bloom filter problem is real - but the real force evicting memory out of the cache line is the next row-group you read + all the other stuff you have to do when you implement this in a database product.
So the two things I work with, Apache Hive and Apache Impala switched to a blocked bloom filter at different points in time.
Hive BloomKFilter - https://github.com/apache/hive/blob/master/storage-api/src/j...
Impala/Kudu one - https://github.com/apache/impala/blob/master/be/src/kudu/uti...
The C++ one also has an AVX specialization, while the Java one relies on the JVM to do it (not always) - https://github.com/apache/impala/blob/master/be/src/kudu/uti...
We ran a lot of trivial benchmarks and several benchmarks where the shuffle-join (not sort-merge, this is just a partitioned hash join) generates a bloom filter (a semijoin) before sending rows out and the 1-cache line version won out when the bloom filter went slightly over the 1 Million + 5% rate [1].
The regular bloom filter went from (38ns -> 108ns for 1k -> 1m items), while the BloomK stuck at (27ns) despite making room for a million times more items in the bloom. The bloom-1 (which is the 64bit version) underperformed on accuracy (was ~2x faster at 16ns per op, but worse at filtering out items).
[1] - https://github.com/prasanthj/bloomfilter/tree/master/benchma...