Solution for hash-map with >100M values

This page summarizes the projects mentioned and recommended in the original post on reddit.com/r/java

Our great sponsors
  • InfluxDB - Access the most powerful time series database as a service
  • SaaSHub - Software Alternatives and Reviews
  • Chronicle Map

    Replicate your Key Value Store across your network, with consistency, persistance and performance.

    I've wrangled data sets in the ~600gb range using nothing but plain old Java and a few beefy boxes. This can all be kept in memory, but you have to go off-heap. You can use Chronicle Map and Chronicle Values to model this data and work with it off-heap in a way that's still very clean and object oriented. 128gb of RAM is cheap these days, whether you're in the cloud or not.

  • MapDB

    MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.

    I have had good results with mapdb

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • Oak

    A Scalable Concurrent Key-Value Map for Big Data Analytics (by yahoo)

    Consider using an database (e.g. H2 embedded, redis) with an on-heap cache (e.g. Caffeine). Since you say it is a Zipfian distribution, the cache should absorb most of the requests. For an off-heap hashtable, you might try Oak as it is likely a faster implementation.

  • java-concurrent-hash-trie-map

    Java port of a concurrent trie hash map implementation from the Scala collections library

  • lasher

    Lasher is an embeddable key-value store written in Java.

    Do you need to update the data after initial load? If not, then I would suggest using my Paldb fork , otherwise you could try my lasher library. It's in early stage but first results are very promising, I was testing it with 10-100M elements and the performance was similar to java hashmap.

  • SmoothieMap

    A gulp of low latency Java

    Try https://github.com/TimeAndSpaceIO/SmoothieMap

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts