Our great sponsors
-
Chronicle Map
Replicate your Key Value Store across your network, with consistency, persistance and performance.
I've wrangled data sets in the ~600gb range using nothing but plain old Java and a few beefy boxes. This can all be kept in memory, but you have to go off-heap. You can use Chronicle Map and Chronicle Values to model this data and work with it off-heap in a way that's still very clean and object oriented. 128gb of RAM is cheap these days, whether you're in the cloud or not.
-
MapDB
MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.
I have had good results with mapdb
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
Consider using an database (e.g. H2 embedded, redis) with an on-heap cache (e.g. Caffeine). Since you say it is a Zipfian distribution, the cache should absorb most of the requests. For an off-heap hashtable, you might try Oak as it is likely a faster implementation.
-
java-concurrent-hash-trie-map
Java port of a concurrent trie hash map implementation from the Scala collections library
-
Do you need to update the data after initial load? If not, then I would suggest using my Paldb fork , otherwise you could try my lasher library. It's in early stage but first results are very promising, I was testing it with 10-100M elements and the performance was similar to java hashmap.
-
Try https://github.com/TimeAndSpaceIO/SmoothieMap