ConcurrentHashMap

Java thread safety problems can bring down your Java EE application and the Java EE container fairly easily. One of most common problems I have observed when troubleshooting Java EE performance problems is infinite looping triggered from the non-thread safe HashMap get() and put() operations.

There seem to be three different synchronized Map implementations in the Java API:

Hashtable

Collections.synchronizedMap(Map)

ConcurrentHashMap

ConcurrentHashMap in Java is introduced as an alternative of Hashtable in Java 1.5 as part of Java concurrency package. Prior to Java 1.5 if you need a Map implementation, which can be safely used in a concurrent and multi-threaded Java program, than, you only have Hashtable or synchronized Map because HashMap is not thread-safe. With ConcurrentHashMap, now you have better choice; because, not only it can be safely used in concurrent multi-threaded environment but also provides better performance over Hashtable and synchronizedMap. ConcurrentHashMap performs better than earlier two because it only locks a portion of Map, instead of whole Map, which is the case with Hashtable and synchronized Map.
ConcurrentHashMap main motive is to allow concurrent access to the map. HashTable offers synchronized access to the map but the entire map is locked to perform any sort of operation.

In ConcurrentHashMap, the basic strategy is to subdivide the table among segments so that the lock is applied only on a segment rather than the entire table. Each segment manages its own internal hash table in size 2^x >=(capacity/no. of segments).

Locking is applied only for updates. In case of of retrievals, it allows full concurrency, retrievals reflect the results of the most recently completed update operations.

If we assume four threads are going to concurrently work on a map of initial capacity 32, we would want the table to be partitioned into four segments, each managing a hash table of capacity 8.
ConcurrentHashMap Data Structure would be:

Segment

The collection maintains a list of 16 segments by default, each of which is used to guard (or lock on) a single bucket of the map. This effectively means that 16 threads can modify the collection at a single time (as long as they’re all working on different buckets). This level of concurrency can be increased using the optional concurrencyLevel constructor argument.

public ConcurrentHashMap(int initialCapacity,float loadFactor, int concurrencyLevel)

The maximum size it can go up to is 2¹⁶. Greater the concurrency level, greater would be the no. of segments and lesser the size of hash table that the segment manages. Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention.

Put Entry

New Entry

ConcurrentHashMap doesn't allow NULL values. The key cannot be NULL, it is used to locate the segment index and then the actual bucket within the segment. The key's hash is re-hashed to strengthen the hash value to defend against poor quality hash functions. This is important as the hash will be used to locate both segment and hash table index. The upper bits will be used to locate the segment index and the lower bits to locate the table index within the segment. This is why re-hashing function is different from that of HashMap as the hash bits need to be spread better to regularize both segment and index locations.

Put If Absent

ConcurrentMap offers new method putIfAbsent() which is similar to put except that the value will not be overridden if key already exists. This is equivalent to

if (!map.containsKey(key))
    return map.put(key, value);
else
   return map.get(key);

except that the action is performed atomically.

containsKey is not synchronized so the above code may cause unexpected behavior. Lets look into the below scenario:

Thread A is trying to remove a key, calls remove(key), acquires lock.

Thread B tries to execute above code, calls containsKey(key). The key exists as Thread A has not yet released the lock so the old value is returned.

Thread A removes the key and releases the lock.

The above scenario proves that the code is not atomic as it returned a key which it thinks exists but actually doesn't.
Also, performance wise it is slow, containsKey has to locate the segment and traverse through the table to find the key. Method put needs to do the same traversing to locate the bucket and put the value. The new method putIfAbsent is a bit faster as it avoids double traversing. It goes through the same internal flow as put, avoids overriding the old value if key already exists and simply returns the old value.

Summary

Now we know What is ConcurrentHashMap in Java and when to use ConcurrentHashMap, it’s time to know and revise some important points about Concurrent HashMap in Java.

1. ConcurrentHashMap allows concurrent read and thread-safe update operation.

2. During update operation, ConcurrentHashMap only lock a portion of Map instead of whole Map.

3. Concurrent update is achieved by internally dividing Map into small portion which is defined by concurrency level.

4. Choose concurrency level carefully as a significant higher number can be waste of time and space and lower number may introduce thread contention in case writers over number concurrency level.

5. All operations of ConcurrentHashMap are thread-safe.

6. Since ConcurrentHashMap implementation doesn't lock whole Map, there is chance of read overlapping with update operations like put() and remove(). In that case result returned by get() method will reflect most recently completed operation from there start.

7. Iterator returned by ConcurrentHashMap is weekly consistent, fail safe and never throw ConcurrentModificationException. In Java.

8. ConcurrentHashMap doesn't allow null as key or value.

9. You can use ConcurrentHashMap in place of Hashtable but with caution as CHM doesn't lock whole Map.

10. During putAll() and clear() operations, concurrent read may only reflect insertion or deletion of some entries.

Search This Blog

Knowledge Cafe

Observability Done Right: Best Practices and Anti-Patterns for Effective System Monitoring