ArangoDB VS Apache HBase

Compare ArangoDB vs Apache HBase and see what are their differences.

ArangoDB

🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions. (by arangodb)
Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
ArangoDB Apache HBase
17 10
13,333 5,113
0.4% 0.9%
9.9 9.6
7 days ago 1 day ago
C++ Java
GNU General Public License v3.0 or later Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

ArangoDB

Posts with mentions or reviews of ArangoDB. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-11.
  • Ask HN: When is pure functional programming beneficial?
    2 projects | news.ycombinator.com | 11 Jul 2023
    ... or working in an environment or on a problem for which functional patterns apply.

    Suppose you are writing a "CRUD" app that writes to a relational database, how do you apply functional programming to that? The whole point of an application like that is that it makes side effects.

    In some cases you can break those problems down into functional pieces. Consider Python drivers for a product like

    https://www.arangodb.com/

    One major problem is that you want drivers that work synchronously and asynchronously, the structure of the average api call is something like

       def query(parameters):
  • Graph Databases vs Relational Databases: What and why?
    6 projects | dev.to | 29 Mar 2023
    First, you need to choose a specific graph database platform to work with, such as Neo4j, OrientDB, JanusGraph, Arangodb or Amazon Neptune. Once you have selected a platform, you can then start working with graph data using the platform's query language.
  • PRQL a simple, powerful, pipelined SQL replacement
    19 projects | news.ycombinator.com | 29 Dec 2022
    Some databases like ArangoDB (https://www.arangodb.com/) allow you to use Javascript instead of SQL.

    However, using a type-unsafe, turing-complete language introduces type unsafety and turing-complete problems to the query layer; the usual problems we know and love, such as infinite loops, runtime type errors, exceptions, and the like.

    Personally, I'm looking forward to a WASM runtime for databases -- so we can run webassembly on the database. This COULD be carefully designed to be statically checked and, possibly, make it really hard to write runaway loops.

  • What Is Going on with Neo4j?
    1 project | news.ycombinator.com | 8 Dec 2022
    When it comes to graphdb's, my favorite is still ArangoDB, definitely worth checking out if you are looking for alternatives.

    https://www.arangodb.com

  • Ask HN: Why are we so fragmented in databases options?
    2 projects | news.ycombinator.com | 26 Oct 2022
    Personally my favorite db for pet projects is

    https://www.arangodb.com/

    I think you hear very little about it because ADB users see it as a "secret weapon" to crush their competitors with. I've done large ontology work (MESH and other health ontologies) and IoT work (keep several years of sensor readings for sensors in my house) and workflow systems (select interesting HN articles or jobs I want to apply to) and it has never let me down. I haven't run a real instance serving customers in the cloud though.

    For the last few years every eng manager I have worked with has been a fan of

    https://www.postgresql.org/

    In the early 2000s I thought it overpromised and underdelivered and called it CrashGreSlow but after MySQL got bought by Oracle the pgsql team has worked hard to improve it I think it is great today. It supports all kinds of advanced features such as stored procs, full-text search, JSON equivalent fields, etc.

  • Have you ever used ArangoDb? Why? Why not?
    1 project | dev.to | 25 Aug 2022
    Hi! I recently came across ArangoDb and used in some POCs, but I really want to know if someone here already used it in a Real World environment or even if chose to not use in a production environment. So... have you ever used ArangoDb? Why? Why not?
  • System Design: The complete course
    31 projects | dev.to | 16 Aug 2022
    For mutual friends, we can build a social graph for every user. Each node in the graph will represent a user and a directional edge will represent followers and followees. After that, we can traverse the followers of a user to find and suggest a mutual friend. This would require a graph database such as Neo4j and ArangoDB.
  • Database of Databases
    6 projects | dev.to | 23 Jun 2022
    ArangoDB
  • Using graphQL+gRPC+Golang to Create a Bike Rental Microservices, with persistence on ArangoDB.
    4 projects | dev.to | 2 Jun 2022
    This a NOSQL database built for high availability and high scalability, a perfect fit for implementing persistence in microservices. ArangoDB is an open source native multi-model database that supports graph, document and key-value data models allowing users to freely combine all data models in a single query. Dive deeper into this database and its features here.
  • Database consideration for a chat application with social graph
    1 project | /r/Database | 22 May 2022
    Pay attention to multi model databases, e.g. https://github.com/arangodb/arangodb

Apache HBase

Posts with mentions or reviews of Apache HBase. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-02-28.
  • How to choose the right type of database
    15 projects | dev.to | 28 Feb 2024
    HBase and Cassandra: Both cater to non-structured Big Data. Cassandra is geared towards scenarios requiring high availability with eventual consistency, while HBase offers strong consistency and is better suited for read-heavy applications where data consistency is paramount.
  • When to Use a NoSQL Database
    4 projects | dev.to | 21 Jul 2023
    NoSQL databases are non-relational databases with flexible schema designed for high performance at a massive scale. Unlike traditional relational databases, which use tables and predefined schemas, NoSQL databases use a variety of data models. There are 4 main types of NoSQL databases - document, graph, key-value, and column-oriented databases. NoSQL databases generally are well-suited for unstructured data, large-scale applications, and agile development processes. The most popular examples of NoSQL databases are MongoDB (document), Memgraph (graph), Redis (key-value store) and Apache HBase (column-oriented).
  • YouTube System Design
    2 projects | /r/softwarearchitecture | 5 Feb 2023
    ### YouTube The popular implementations of an on-demand video streaming service are the following: - YouTube - Netflix - Vimeo - TikTok --- #### Requirements - The user (**client**) can upload video files - The user can stream video content - The user can search for videos based on the video title --- #### Data storage ##### Database schema - The primary entities are the videos, the users, and the comments tables - The relationship between the users and the videos is 1-to-many - The relationship between the users and the comments table is 1-to-many - The relationship between the videos and the comments table is 1-to-many --- ##### Type of data store - The wide-column data store ([LSM](https://en.wikipedia.org/wiki/Log-structured\_merge-tree) tree-based) such as [Apache HBase](https://hbase.apache.org/) is used to persist thumbnail images for clumping the files together, fault-tolerance, and replication - A cache server such as Redis is used to store the metadata of popular video content - Message queue such as Apache Kafka is used for the asynchronous processing (encoding) of videos - A relational database such as MySQL stores the metadata of the users and the videos - The video files are stored in a managed object storage such as AWS S3 - Lucene-based inverted-index data store such as Apache Solr is used to persist the video index data to provide search functionality --- #### High-level design - Popular video content is streamed from CDN - Video encoding (**transcoding**) is the process of converting a video format to other formats (MPEG, HLS) to provide the best stream possible on multiple devices and bandwidth - A message queue can be configured between services for parallelism and improved fault tolerance Codecs (H.264, VP9, HEVC) are compression and decompression algorithms used to reduce video file size while preserving video quality - The popular video streaming protocols (data transfer standard) are **MPEG-DASH** (Moving Pictures Experts Group - Dynamic Adaptive Streaming over HTTP), **Apple HLS** (HTTP Live Streaming), **Microsoft Smooth Streaming**, and **Adobe HDS** (HTTP Dynamic Streaming) --- #### Video upload workflow 1. The user (**client**) executes a DNS query to identify the server 2. The client makes an HTTP connection to the load balancer 3. The video upload requests are rate limited to prevent malicious clients 4. The load balancer delegates the client's request to an API server (**web server**) with free capacity 5. The web server delegates the client's request to an app server that handles the API endpoint 6. The ID of the uploaded video is stored on the message queue for asynchronous processing of the video file 7. The title and description (**metadata**) of the video are stored in the metadata database 8. The app server queries the object store service to generate a pre-signed URL for storing the raw video file 9. The client uploads the raw video file directly to the object store using the pre-signed URL to save the system network bandwidth 10. The transcoding servers query the message queue using the publish-subscribe pattern to get notified on uploaded videos 11. The transcoding server fetches the raw video file by querying the raw object store 12. The transcoding server transcodes the raw video file into multiple codecs and stores the transcoded content on the transcoded object store 13. The thumbnail server generates on average five thumbnail images for each video file and stores the generated images on the thumbnail store 14. The transcoding server persists the ID of the transcoded video on the message queue for further processing 15. The upload handler service queries the message queue through the publish-subscribe pattern to get notified on transcoded video files 16. The upload handler service updates the metadata database with metadata of transcoded video files 17. The upload handler service queries the notification service to notify the client of the video processing status 18. The database can be partitioned through [consistent hashing](https://systemdesign.one/consistent-hashing-explained/) (key = user ID or video ID) 19. [Block matching](https://en.wikipedia.org/wiki/Block-matching\_algorithm) or [Phase correlation](https://en.wikipedia.org/wiki/Phase\_correlation) algorithms can be used to detect the duplicate video content 20. The web server (API server) must be kept stateless for scaling out through replication 21. The video file is stored in multiple resolutions and formats in order to support multiple devices and bandwidth 22. The video can be split into smaller chunks by the client before upload to support the resume of broken uploads 23. Watermarking and encryption can be used to protect video content 24. The data centers are added to improve latency and data recovery at the expense of increased maintenance workflows 25. Dead letter queue can be used to improve fault tolerance and error handling 26. Chaos engineering is used to identify the failures on networks, servers, and applications 27. Load testing and chaos engineering are used to improve fault tolerance 28. [RAID](https://en.wikipedia.org/wiki/RAID) configuration improves the hardware throughput 29. The data store is partitioned to spread the writes and reads at the expense of difficult joins, transactions, and fat client 30. Federation and sharding are used to scale out the database 31. The write requests are redirected to the leader and the read requests are redirected to the followers of the database 32. [Vitess](https://vitess.io/) is a storage middleware for scaling out MySQL 33. Vitess redirects the read requests that require fresh data to the leader (For example, update user profile operation) 34. Vitess uses a lock server (Apache Zookeeper) for automatic sharding and leader election on the database layer 35. Vitess supports RPC-based joins, indexing, and transactions on SQL database 36. Vitess allows to offload of partitioning logic from the application and improves database queries by caching
  • In One Minute : Hadoop
    10 projects | dev.to | 21 Nov 2022
    HBase, A scalable, distributed database that supports structured data storage for large tables.
  • SQL or a graph database to build a social network with recommender?
    1 project | news.ycombinator.com | 18 Aug 2022
  • What’s the Database Plus concept and what challenges can it solve?
    5 projects | dev.to | 10 May 2022
    Today, it is normal for enterprises to leverage diversified databases. In my market of expertise, China, in the Internet industry, MySQL together with data sharding middleware is the go to architecture, with GreenPlum, HBase, Elasticsearch, Clickhouse and other big data ecosystems being auxiliary computing engine for analytical data. At the same time, some legacy systems (such as SQLServer legacy from .NET transformation, or Oracle legacy from outsourcing) can still be found in use. In the financial industry, Oracle or DB2 is still heavily used as the core transaction system. New business is migrating to MySQL or PostgreSQL. In addition to transactional databases, analytical databases are increasingly diversified as well.
  • Fully featured Repository Pattern with Typescript and native PostgreSQL driver
    5 projects | dev.to | 20 Mar 2022
    For this type of systems PostgreSQL not best solution, and for a number of reasons like lack of replication out of the box. And we strictly must not have «Vendor lock», and therefore also did not take modern SQL databases like Amazon Aurora. And end of the ends the choice was made in favor Cassandra, for this article where we will talking about low-lever implementation of Repository Pattern it is not important, in your case it can be any unpopular database like HBase for example.
  • Non-relational data models
    2 projects | dev.to | 30 Nov 2021
    Apache HBase
  • The Data Engineer Roadmap 🗺
    11 projects | dev.to | 19 Oct 2021
    Wide column: Apache Cassandra, Apache HBase
  • Paper review: Simple Testing in Distributed Systems
    3 projects | dev.to | 31 May 2021
    The authors performed an analysis of critical failures of the five distributed systems: Cassandra, HBase, HDFS, MapReduce, and Redis.

What are some alternatives?

When comparing ArangoDB and Apache HBase you can also consider the following projects:

MongoDB - The MongoDB Database

Druid - Apache Druid: a high performance real-time analytics database.

Neo4j - Graphs for Everyone

Scylla - NoSQL data store using the seastar framework, compatible with Apache Cassandra

indradb - A graph database written in rust

Hypertable - A flexible database focused on performance and scalability

skytable - Skytable is a modern scalable NoSQL database with BlueQL, designed for performance, scalability and flexibility. Skytable gives you spaces, models, data types, complex collections and more to build powerful experiences

Redis - Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.

Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

RavenDB - ACID Document Database

Apache Cassandra - Mirror of Apache Cassandra