Apache Impala vs ibis

Apache Impala

Apache Impala (by apache)

impala

Source Code

impala.apache.org

Suggest alternative

Edit details

ibis

the portable Python dataframe library (by ibis-project)

Python impala Pandas Database Clickhouse Postgresql Sqlite MySQL datafusion SQL Pyspark Dask duckdb Bigquery pyarrow Mssql polars Snowflake trino Sqlalchemy

Source Code

ibis-project.org

Suggest alternative

Edit details

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Apache Impala		ibis
	Project
1	Mentions	23
1,079	Stars	4,208
1.8%	Growth	10.9%
9.7	Activity	10.0
5 days ago	Latest Commit	1 day ago
C++	Language	Python
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Apache Impala

Posts with mentions or reviews of Apache Impala. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-10-03.

Word-Aligned Bloom Filters
5 projects | news.ycombinator.com | 3 Oct 2021

> whether this would really work out in most workloads
> just because it keeps the cache-lines hotter and less likely to be evicted.
Okay, so keeping cache for a bloom filter problem is real - but the real force evicting memory out of the cache line is the next row-group you read + all the other stuff you have to do when you implement this in a database product.
So the two things I work with, Apache Hive and Apache Impala switched to a blocked bloom filter at different points in time.
Hive BloomKFilter - https://github.com/apache/hive/blob/master/storage-api/src/j...
Impala/Kudu one - https://github.com/apache/impala/blob/master/be/src/kudu/uti...
The C++ one also has an AVX specialization, while the Java one relies on the JVM to do it (not always) - https://github.com/apache/impala/blob/master/be/src/kudu/uti...
We ran a lot of trivial benchmarks and several benchmarks where the shuffle-join (not sort-merge, this is just a partitioned hash join) generates a bloom filter (a semijoin) before sending rows out and the 1-cache line version won out when the bloom filter went slightly over the 1 Million + 5% rate [1].
The regular bloom filter went from (38ns -> 108ns for 1k -> 1m items), while the BloomK stuck at (27ns) despite making room for a million times more items in the bloom. The bloom-1 (which is the 64bit version) underperformed on accuracy (was ~2x faster at 16ns per op, but worse at filtering out items).
[1] - https://github.com/prasanthj/bloomfilter/tree/master/benchma...

ibis

Posts with mentions or reviews of ibis. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-17.

Show HN: Hashquery, a Python library for defining reusable analysis
1 project | news.ycombinator.com | 23 Apr 2024

I really don't understand the appeal of dbt vs a proper programming language. The templating approach leads to massive spaghetti. I look forward to trying out something like Ibis [0]
0: https://ibis-project.org/
This Week In Python
5 projects | dev.to | 17 Mar 2024

ibis – portable Python dataframe library
Ibis: The portable Python dataframe library
1 project | news.ycombinator.com | 13 Mar 2024

1 project | news.ycombinator.com | 22 Feb 2024
FLaNK Stack 26 February 2024
50 projects | dev.to | 26 Feb 2024
Quarto
5 projects | news.ycombinator.com | 14 Feb 2024

The main benefit is that you get a Python (or R, Julia or Rust) interpreter. So you can evaluate code. A good example of the value of this is the Ibis docs which use Quarto: https://ibis-project.org/
Polars – A bird's eye view of Polars
4 projects | news.ycombinator.com | 13 Feb 2024

Ive found polars quite intuitive, though for python, I lean more towards [ibis](https://ibis-project.org/). The interface is nearly identical, but ibis has the benefit if building sql queries before pulling any actual data (like dbplyr) — whereas polars requires the data to be in-memory (at least for rdb’s, though correct me if Im wrong)
this to me seems like a good argument for only using ibis, but Im happy to be convinced otherwise
Ibis – Universal Interface for Data Wrangling
1 project | news.ycombinator.com | 13 Feb 2024
Vanna.ai: Chat with your SQL database
13 projects | news.ycombinator.com | 14 Jan 2024

Please add Ibis Birdbrain https://ibis-project.github.io/ibis-birdbrain/ to the list. Birdbrain is an AI-powered data bot, built on Ibis and Marvin, supporting more than 18 database backends.
See https://github.com/ibis-project/ibis and https://ibis-project.org for more details.
Ibis
1 project | news.ycombinator.com | 10 Jan 2024

What are some alternatives?

When comparing Apache Impala and ibis you can also consider the following projects:

seed_rl - SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference. Implements IMPALA and R2D2 algorithms in TF2 with SEED's architecture.

snowflake-connector-python - Snowflake Connector for Python

Apache Impala vs seed_rl ibis vs snowflake-connector-python

Compare Apache Impala vs ibis and see what are their differences.

Apache Impala

ibis

Apache Impala

ibis

What are some alternatives?