iceberg
hiveberg
iceberg | hiveberg | |
---|---|---|
18 | 1 | |
5,540 | 21 | |
2.1% | - | |
9.9 | 10.0 | |
3 days ago | about 3 years ago | |
Java | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
iceberg
- Iceberg won the table format war: But not in the way you thought it might
- Lakehouse using AWS Athena on Iceberg Concerns
- apache/iceberg: Apache Iceberg
- What are the main things I need to know to be hired as a Java developer?
- Have you used Athena Iceberg for small(-ish) data?
- Is Data Lakehouse a threat to Snowflake?
-
Snowflake vs databricks cloud/labor cost
This is interesting, imo.
- Setting the Table: Benchmarking Open Table Formats
-
Spark Dynamic Partition Overwrite Mode Replaces Existing Data
If you're using Iceberg as your table format, it had bugs with MERGE INTO with non-nullable columns up until September: https://github.com/apache/iceberg/pull/5679
-
How to migrate delta tables to iceberg?
yeah, this as a capability is a WIP and discussion point in the iceberg community - https://github.com/apache/iceberg/pull/5331
hiveberg
-
The necessity of Hive if using Iceberg?
No direct experience using Hive and Iceberg, but I do know that Expedia created their own library to handle the interaction. Checking out the github page for it shows a note that this functionality has been ported into Iceberg itself. From my understanding, this is more for people already using Hive as a Metastore. But if you are starting from scratch without Hive, Iceberg can work just fine with Spark directly
What are some alternatives?
kudu - Mirror of Apache Kudu
nessie - Nessie: Transactional Catalog for Data Lakes with Git-like semantics
hudi - Upserts, Deletes And Incremental Processing on Big Data.
starrocks - StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
Apache Avro - Apache Avro is a data serialization system.
Apache Hive - Apache Hive
debezium - Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
Apache Drill - Apache Drill is a distributed MPP query layer for self describing data
RocksDB - A library that provides an embeddable, persistent key-value store for fast storage.
doris - Apache Doris is an easy-to-use, high performance and unified analytics database.
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Dask - Parallel computing with task scheduling