Solution for hash-map with >100M values

This page summarizes the projects mentioned and recommended in the original post on /r/java

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • Chronicle Map

    Replicate your Key Value Store across your network, with consistency, persistance and performance.

  • I've wrangled data sets in the ~600gb range using nothing but plain old Java and a few beefy boxes. This can all be kept in memory, but you have to go off-heap. You can use Chronicle Map and Chronicle Values to model this data and work with it off-heap in a way that's still very clean and object oriented. 128gb of RAM is cheap these days, whether you're in the cloud or not.

  • MapDB

    MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.

  • I have had good results with mapdb

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Oak

    A Scalable Concurrent Key-Value Map for Big Data Analytics (by yahoo)

  • Consider using an database (e.g. H2 embedded, redis) with an on-heap cache (e.g. Caffeine). Since you say it is a Zipfian distribution, the cache should absorb most of the requests. For an off-heap hashtable, you might try Oak as it is likely a faster implementation.

  • java-concurrent-hash-trie-map

    Java port of a concurrent trie hash map implementation from the Scala collections library

  • lasher

    Lasher is an embeddable key-value store written in Java.

  • Do you need to update the data after initial load? If not, then I would suggest using my Paldb fork , otherwise you could try my lasher library. It's in early stage but first results are very promising, I was testing it with 10-100M elements and the performance was similar to java hashmap.

  • SmoothieMap

    A gulp of low latency Java

  • Try https://github.com/TimeAndSpaceIO/SmoothieMap

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts