Apache Hive Alternatives
Similar projects and alternatives to Apache Hive
-
ObjectBox Java (Kotlin, Android)
Fast lightweight Java Database for storing and syncing objects in Mobile & IoT
-
HikariCP
光 HikariCP・A solid, high-performance, JDBC connection pool at last.
-
Scout APM
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
-
superset
Apache Superset is a Data Visualization and Data Exploration Platform
-
-
-
Presto
The official home of the Presto distributed SQL query engine for big data
-
Apache Spark
Apache Spark - A unified analytics engine for large-scale data processing
-
SonarLint
Clean code begins in your IDE with SonarLint. Up your coding game and discover issues early. SonarLint is a free plugin that helps you find & fix bugs and security issues from the moment you start writing code. Install from your favorite IDE marketplace today.
-
Spring Data JPA
Simplifies the development of creating a JPA-based data access layer.
-
-
Airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
-
-
beam
Apache Beam is a unified programming model for Batch and Streaming data processing.
-
materialize
The Fastest Way to Build the Fastest Data Products. Build data-intensive applications and services in SQL — without pipelines or caches — using materialized views that are always up-to-date. (by MaterializeInc)
-
-
-
-
cockroach
CockroachDB - the open source, cloud-native distributed SQL database.
-
-
-
cube.js
📊 Cube — Headless Business Intelligence for Building Data Applications
Apache Hive reviews and mentions
- Visionary French entrepreneur, David Gurle, launches new venture – Hive
-
DeWitt Clause, or Can You Benchmark %DATABASE% and Get Away With It
Apache Drill, Druid, Flink, Hive, Kafka, Spark
-
Apache Spark, Hive, and Spring Boot — Testing Guide
In this article, I'm showing you how to create a Spring Boot app that loads data from Apache Hive via Apache Spark to the Aerospike Database. More than that, I'm giving you a recipe for writing integration tests for such scenarios that can be run either locally or during the CI pipeline execution. The code examples are taken from this repository.
- Apache Hive in the vein!
-
Jinja2 not formatting my text correctly. Any advice?
ListItem(name='Apache Hive', website='https://hive.apache.org/', category='Interactive Query', short_description='Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop.'),
-
Understanding SQL Dialects
Apache Hive takes in a specific SQL dialect and converts it to map-reduce.
-
The Data Engineer Roadmap 🗺
Apache Hive
-
Open Source SQL Parsers
Apache Calcite is a popular parser/optimizer that is used in popular databases and query engines like Apache Hive, BlazingSQL and many others.
-
Word-Aligned Bloom Filters
> whether this would really work out in most workloads
> just because it keeps the cache-lines hotter and less likely to be evicted.
Okay, so keeping cache for a bloom filter problem is real - but the real force evicting memory out of the cache line is the next row-group you read + all the other stuff you have to do when you implement this in a database product.
So the two things I work with, Apache Hive and Apache Impala switched to a blocked bloom filter at different points in time.
Hive BloomKFilter - https://github.com/apache/hive/blob/master/storage-api/src/j...
Impala/Kudu one - https://github.com/apache/impala/blob/master/be/src/kudu/uti...
The C++ one also has an AVX specialization, while the Java one relies on the JVM to do it (not always) - https://github.com/apache/impala/blob/master/be/src/kudu/uti...
We ran a lot of trivial benchmarks and several benchmarks where the shuffle-join (not sort-merge, this is just a partitioned hash join) generates a bloom filter (a semijoin) before sending rows out and the 1-cache line version won out when the bloom filter went slightly over the 1 Million + 5% rate [1].
The regular bloom filter went from (38ns -> 108ns for 1k -> 1m items), while the BloomK stuck at (27ns) despite making room for a million times more items in the bloom. The bloom-1 (which is the 64bit version) underperformed on accuracy (was ~2x faster at 16ns per op, but worse at filtering out items).
[1] - https://github.com/prasanthj/bloomfilter/tree/master/benchma...
-
5 Best Big Data Frameworks You Can Learn in 2021
Both Fortune 500 and small companies are looking for competent people who can derive useful insight from their huge pile of data and that's where Big Data Framework like Apache Hadoop, Apache Spark, Flink, Storm, and Hive can help.
-
how to get into data eng pt.2
How to flesh this idea out more? Start ingesting different types of events and putting them in different tables. Definitely try an OLAP solution (Hive? Most of my experience sits with closed source projects that only a company can afford) and think about column optimised file formats. Get the raw data into an OLAP solution and do aggregations into another systems based off of the OLAP solution.
-
Keep yourself up to date in data engineering
Thanks for the advice. Issues, RFCs, release notes, changelogs, blog posts, GitHub branch comparison https://github.com/apache/hive/compare/master...release-3.1.3-rc0 It's tough to keep up with everything.
Stats
apache/hive is an open source project licensed under Apache License 2.0 which is an OSI approved license.
Popular Comparisons
Are you hiring? Post a new remote job listing for free.