Top 23 Hive Open-Source Projects

cube.js

86 17,135 9.9 Rust

📊 Cube — The Semantic Layer for Building Data Applications

Project mention: MQL – Client and Server to query your DB in natural language | news.ycombinator.com | 2024-04-07

I should have clarified. There's a large number of apps that are:
1. taking info strictly from SQL (e.g. information_schema, query history)
2. taking a user input / question
3. writing SQL to answer that question
An app like this is what I call "text-to-sql". Totally agree a better system would pull in additional documentation (which is what we're doing), but I'd no longer consider it "text-to-sql". In our case, we're not even directly writing SQL, but rather generating semantic layer queries (i.e. https://cube.dev/).

APIJSON

0 16,643 8.4 Java

🏆 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码，前端(客户端) 定制返回 JSON 的数据和结构。 🏆 A JSON Transmission Protocol and an ORM Library 🚀 provides APIs and Docs without writing any code.
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Presto

14 15,591 9.9 Java

The official home of the Presto distributed SQL query engine for big data

Project mention: Multi-Database Support in DuckDB | news.ycombinator.com | 2024-01-28

We have some of this functionality in Presto (https://github.com/prestodb/presto), but it takes fair bit of work to implement it for all the different backends.

doris

42 11,314 10.0 Java

Apache Doris is an easy-to-use, high performance and unified analytics database.

Project mention: Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis | dev.to | 2024-03-27

As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.

Trino

44 9,552 10.0 Java

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Project mention: Trino: Fast distributed SQL query engine for big data analytics | news.ycombinator.com | 2024-03-19

sqlglot

56 5,441 9.9 Python

Python SQL Parser and Transpiler

Project mention: The Future of MySQL is PostgreSQL: an extension for the MySQL wire protocol | news.ycombinator.com | 2024-04-26

This is probably referring to "zero changes to your driver code" and not "zero changes to the SQL you send over this driver".
Translating between SQL dialects is notoriously hard and attempts to translate [1] are working in 95% of cases. But the last 5% would require 5x amount of work. That's because "SQL dialect" also includes weird edge cases of type inference of things like COALESCE(5, FALSE) and emulation of system catalogs (pg_catalog, information_schema).
[1] https://github.com/tobymao/sqlglot

Apache Hive

14 5,326 9.6 Java

Apache Hive
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Hive

10 3,874 5.0 Dart

Lightweight and blazing fast key-value database written in pure Dart. (by isar)
linkis

2 3,227 9.5 Java

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
kyuubi

1 1,928 9.8 Scala

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Apache Drill

9 1,894 8.1 Java

Apache Drill is a distributed MPP query layer for self describing data (by apache)

Project mention: Git Query Language (GQL) Aggregation Functions, Groups, Alias | /r/ProgrammingLanguages | 2023-06-30

Also are you familiar with apache drill . The idea is to put an SQL interpreter in front of any kind of database just like you are doing for git here.

querybook

2 1,737 8.6 TypeScript

Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.
PyHive

1 1,665 3.5 Python

Python interface to Hive and Presto. 🐝
yauaa

2 728 9.7 Java

Yet Another UserAgent Analyzer
WeDataSphere

7 633 5.0

WeDataSphere is a financial grade, one-stop big data platform suite.
jupysql

8 598 9.3 Python

Better SQL in Jupyter. 📊

Project mention: Show HN: JupySQL – a SQL client for Jupyter (ipython-SQL successor) | news.ycombinator.com | 2023-12-06

Hey, HN community!
We're stoked to launch JupySQL today! JupySQL is an open-source library that brings a modern SQL experience to Jupyter. JupySQL is compatible with all major databases, such as Snowflake, Redshift, PostgreSQL, MySQL, MariaDB, DuckDB, SQL Server, Clickhouse, Trino, and more!
To get started, check out our tutorial: https://jupysql.ploomber.io/en/latest/quick-start.html
SQL is the defacto language for data analysis; however, analysis often requires a mix of SQL and Python. JupySQL bridges this gap, allowing users to execute SQL queries seamlessly in Jupyter and continue their analysis in Python. Add %%sql to the top of your cell and start writing SQL.
Here are some of JupySQL's main features:
- Syntax highlighting

mlcraft

16 467 9.0 JavaScript

Synmetrix – open source semantic layer / Boost your LLM precision

Project mention: Show HN: Synmetrix – Open-Source Platform for Data and Metrics Management | news.ycombinator.com | 2024-02-28

MovieLab

1 385 10.0 Dart

An open source movie tracker and movie finder.
hive

2 323 9.7 C++

Fast. Scalable. Powerful. The Blockchain for Web3 (by openhive-network)

Project mention: Welcome to r/DBuzzWorld - READ This to Get Started! | /r/dbuzzworld | 2023-05-16

helicalinsight

1 282 0.0 Java

Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
waggle-dance

1 258 7.7 Java

Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
dataCompare

1 234 3.7 Java

big data comparison and data profiling platform: low code，data comparison and data profiling
bitalarm

2 190 2.4 Dart

An app to keep track of different cryptocurrencies, written in dart + flutter
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Hive related posts

Trino: Fast distributed SQL query engine for big data analytics
1 project | news.ycombinator.com | 19 Mar 2024
Show HN: Synmetrix – Open-Source Platform for Data and Metrics Management
2 projects | news.ycombinator.com | 28 Feb 2024
Show HN: Synmetrix – Open Semantic Layer
1 project | news.ycombinator.com | 26 Feb 2024
Game analytic power: how we process more than 1 billion events per day
1 project | dev.to | 24 Nov 2023
Your Thoughts on OLAPs Clickhouse vs Apache Druid vs Starrocks in 2023/2024
2 projects | /r/dataengineering | 16 Nov 2023
Hexagonal Grids
6 projects | news.ycombinator.com | 20 Oct 2023
Log Analysis: Elasticsearch VS Apache Doris
1 project | dev.to | 16 Oct 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 26 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Hive projects? This list will help you:

	Project	Stars
1	cube.js	17,135
2	APIJSON	16,643
3	Presto	15,591
4	doris	11,314
5	Trino	9,552
6	sqlglot	5,441
7	Apache Hive	5,326
8	Hive	3,874
9	linkis	3,227
10	kyuubi	1,928
11	Apache Drill	1,894
12	querybook	1,737
13	PyHive	1,665
14	yauaa	728
15	WeDataSphere	633
16	jupysql	598
17	mlcraft	467
18	MovieLab	385
19	hive	323
20	helicalinsight	282
21	waggle-dance	258
22	dataCompare	234
23	bitalarm	190