Solution for hash-map with >100M values

This page summarizes the projects mentioned and recommended in the original post on /r/java

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • Chronicle Map

    Replicate your Key Value Store across your network, with consistency, persistance and performance.

    I've wrangled data sets in the ~600gb range using nothing but plain old Java and a few beefy boxes. This can all be kept in memory, but you have to go off-heap. You can use Chronicle Map and Chronicle Values to model this data and work with it off-heap in a way that's still very clean and object oriented. 128gb of RAM is cheap these days, whether you're in the cloud or not.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • MapDB

    MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.

    I have had good results with mapdb

  • Oak

    A Scalable Concurrent Key-Value Map for Big Data Analytics (by yahoo)

    Consider using an database (e.g. H2 embedded, redis) with an on-heap cache (e.g. Caffeine). Since you say it is a Zipfian distribution, the cache should absorb most of the requests. For an off-heap hashtable, you might try Oak as it is likely a faster implementation.

  • java-concurrent-hash-trie-map

    Java port of a concurrent trie hash map implementation from the Scala collections library

  • lasher

    Lasher is an embeddable key-value store written in Java.

    Do you need to update the data after initial load? If not, then I would suggest using my Paldb fork , otherwise you could try my lasher library. It's in early stage but first results are very promising, I was testing it with 10-100M elements and the performance was similar to java hashmap.

  • SmoothieMap

    A gulp of low latency Java

    Try https://github.com/TimeAndSpaceIO/SmoothieMap

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Great Time at JavaZone 2022

    3 projects | dev.to | 14 Sep 2022
  • Whippet-Db - a fast embedded key-value database

    1 project | /r/java | 19 Aug 2021
  • Dynamo-like key/value databases - A deep dive - Part 0 - Intro

    2 projects | dev.to | 20 Jul 2024
  • Trino: A fast distributed SQL query engine for big data analytics

    1 project | news.ycombinator.com | 9 Jul 2024
  • Run Flyway DB migrations with AWS Lambda and RDS - Part 1

    1 project | dev.to | 6 Jul 2024

Did you konow that Java is
the 8th most popular programming language
based on number of metions?