Solution for hash-map with >100M values

This page summarizes the projects mentioned and recommended in the original post on /r/java

InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. Chronicle Map

    Replicate your Key Value Store across your network, with consistency, persistance and performance.

    I've wrangled data sets in the ~600gb range using nothing but plain old Java and a few beefy boxes. This can all be kept in memory, but you have to go off-heap. You can use Chronicle Map and Chronicle Values to model this data and work with it off-heap in a way that's still very clean and object oriented. 128gb of RAM is cheap these days, whether you're in the cloud or not.

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. MapDB

    MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.

    I have had good results with mapdb

  4. Oak

    A Scalable Concurrent Key-Value Map for Big Data Analytics (by yahoo)

    Consider using an database (e.g. H2 embedded, redis) with an on-heap cache (e.g. Caffeine). Since you say it is a Zipfian distribution, the cache should absorb most of the requests. For an off-heap hashtable, you might try Oak as it is likely a faster implementation.

  5. java-concurrent-hash-trie-map

    Java port of a concurrent trie hash map implementation from the Scala collections library

  6. lasher

    Lasher is an embeddable key-value store written in Java.

    Do you need to update the data after initial load? If not, then I would suggest using my Paldb fork , otherwise you could try my lasher library. It's in early stage but first results are very promising, I was testing it with 10-100M elements and the performance was similar to java hashmap.

  7. SmoothieMap

    A gulp of low latency Java

    Try https://github.com/TimeAndSpaceIO/SmoothieMap

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Great Time at JavaZone 2022

    3 projects | dev.to | 14 Sep 2022
  • Whippet-Db - a fast embedded key-value database

    1 project | /r/java | 19 Aug 2021
  • What should we do when SQL (and stored procedure) runs too slowly?

    1 project | dev.to | 22 May 2025
  • Cache Invalidation: The Silent Performance Killer

    1 project | dev.to | 20 May 2025
  • Why redis is losing friends and valkey is gaining them

    2 projects | dev.to | 17 May 2025

Did you know that Java is
the 8th most popular programming language
based on number of references?