Solution for hash-map with >100M values

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Chronicle Map

4 2,684 8.4 Java

Replicate your Key Value Store across your network, with consistency, persistance and performance.

I've wrangled data sets in the ~600gb range using nothing but plain old Java and a few beefy boxes. This can all be kept in memory, but you have to go off-heap. You can use Chronicle Map and Chronicle Values to model this data and work with it off-heap in a way that's still very clean and object oriented. 128gb of RAM is cheap these days, whether you're in the cloud or not.

MapDB

5 4,832 0.0 Java

MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.

I have had good results with mapdb

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Oak

2 266 0.0 Java

A Scalable Concurrent Key-Value Map for Big Data Analytics (by yahoo)

Consider using an database (e.g. H2 embedded, redis) with an on-heap cache (e.g. Caffeine). Since you say it is a Zipfian distribution, the cache should absorb most of the requests. For an off-heap hashtable, you might try Oak as it is likely a faster implementation.

java-concurrent-hash-trie-map

1 150 0.0 Java

Java port of a concurrent trie hash map implementation from the Scala collections library
lasher

1 4 0.0 Java

Lasher is an embeddable key-value store written in Java.

Do you need to update the data after initial load? If not, then I would suggest using my Paldb fork , otherwise you could try my lasher library. It's in early stage but first results are very promising, I was testing it with 10-100M elements and the performance was similar to java hashmap.

SmoothieMap

1 292 0.0 Java

A gulp of low latency Java

Try https://github.com/TimeAndSpaceIO/SmoothieMap

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Great Time at JavaZone 2022
3 projects | dev.to | 14 Sep 2022
Whippet-Db - a fast embedded key-value database
1 project | /r/java | 19 Aug 2021
Computing Engine on Web
1 project | news.ycombinator.com | 22 Apr 2024
Valkey Is Rapidly Overtaking Redis
4 projects | news.ycombinator.com | 19 Apr 2024
What is the difference between BI and AI？
1 project | news.ycombinator.com | 18 Apr 2024

Solution for hash-map with >100M values

This page summarizes the projects mentioned and recommended in the original post on /r/java
Database Key-Value time-series Store Java
Post date: 21 Dec 2020

Chronicle Map

MapDB

InfluxDB

Oak

java-concurrent-hash-trie-map

lasher

SmoothieMap

Related posts

Solution for hash-map with &gt;100M values

This page summarizes the projects mentioned and recommended in the original post on /r/java Database Key-Value time-series Store Java Post date: 21 Dec 2020

Chronicle Map

MapDB

InfluxDB

Oak

java-concurrent-hash-trie-map

lasher

SmoothieMap

Related posts

Solution for hash-map with >100M values

This page summarizes the projects mentioned and recommended in the original post on /r/java
Database Key-Value time-series Store Java
Post date: 21 Dec 2020