iceberg vs hiveberg

iceberg

Apache Iceberg (by apache)

Source Code

iceberg.apache.org

Suggest alternative

Edit details

hiveberg

Demonstration of a Hive Input Format for Iceberg (by ExpediaGroup)

Hive iceberg data-lake

DISCONTINUED

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

iceberg		hiveberg
	Project
18	Mentions	1
5,540	Stars	21
2.1%	Growth	-
9.9	Activity	10.0
3 days ago	Latest Commit	about 3 years ago
Java	Language	Java
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

iceberg

Posts with mentions or reviews of iceberg. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-06.

Iceberg won the table format war: But not in the way you thought it might
2 projects | /r/dataengineering | 6 Jul 2023
Lakehouse using AWS Athena on Iceberg Concerns
1 project | /r/dataengineering | 28 May 2023
apache/iceberg: Apache Iceberg
1 project | /r/devopsish | 13 Feb 2023
What are the main things I need to know to be hired as a Java developer?
4 projects | /r/java | 1 Feb 2023
Have you used Athena Iceberg for small(-ish) data?
1 project | /r/aws | 28 Jan 2023
Is Data Lakehouse a threat to Snowflake?
1 project | /r/dataengineering | 24 Jan 2023
Snowflake vs databricks cloud/labor cost
1 project | /r/snowflake | 6 Dec 2022

This is interesting, imo.
Setting the Table: Benchmarking Open Table Formats
2 projects | /r/dataengineering | 1 Dec 2022
Spark Dynamic Partition Overwrite Mode Replaces Existing Data
1 project | /r/apachespark | 20 Nov 2022

If you're using Iceberg as your table format, it had bugs with MERGE INTO with non-nullable columns up until September: https://github.com/apache/iceberg/pull/5679
How to migrate delta tables to iceberg?
1 project | /r/dataengineering | 5 Oct 2022

yeah, this as a capability is a WIP and discussion point in the iceberg community - https://github.com/apache/iceberg/pull/5331

hiveberg

Posts with mentions or reviews of hiveberg. We have used some of these posts to build our list of alternatives and similar projects.

The necessity of Hive if using Iceberg?
1 project | /r/dataengineering | 8 Nov 2021

No direct experience using Hive and Iceberg, but I do know that Expedia created their own library to handle the interaction. Checking out the github page for it shows a note that this functionality has been ported into Iceberg itself. From my understanding, this is more for people already using Hive as a Metastore. But if you are starting from scratch without Hive, Iceberg can work just fine with Spark directly

What are some alternatives?

When comparing iceberg and hiveberg you can also consider the following projects:

kudu - Mirror of Apache Kudu

nessie - Nessie: Transactional Catalog for Data Lakes with Git-like semantics

hudi - Upserts, Deletes And Incremental Processing on Big Data.

starrocks - StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.

Apache Avro - Apache Avro is a data serialization system.

Apache Hive - Apache Hive

debezium - Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

Apache Drill - Apache Drill is a distributed MPP query layer for self describing data

RocksDB - A library that provides an embeddable, persistent key-value store for fast storage.

doris - Apache Doris is an easy-to-use, high performance and unified analytics database.

delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Dask - Parallel computing with task scheduling