What is String Deduplication

In this article, we have covered the basics of string deduplication. This is a short article comparatively.


String deduplication can not be done for the actual string objects. It is done only for the backing character arrays. Since deduplication can not be done safely for the String object. it could be used by the application as the object’s identity for example: it might be used for the object synchronization.

This feature was introduced in java-8 update-20. So, this is present in this version along with the further versions.

String deduplication feature works only if we have enabled the G1 garbage collector.

This deduplication is done when the garbage collection is performed. While marking for the live objects in the heap, it is also checked whether the string object is eligible for deduplication or not. If it is , then a reference to this object is kept in a deduplication queue for later processing.

A separate thread runs for the deduplication process in the background which processes this queue. The thread tries to deduplicate each object from the queue and removes after that (in either case, whether it got deduplicated or not).

As we already know that character array contains the actual data of the string object and the string object points to that character array. This array is kept in one hashtable according the hashcode of the string data.

So, the deduplication thread performs the lookup in this hashtable to check whether there is already a character array with the same data. For that , first it checks with the hashcode if the value of hashcode is same then it compares the entire string character by character. If it matches, then the string object is adjusted to point to this character array and releases the reference to the original character array. Which eventually becomes eligible for garbage collection and gets collected by GC.

Leave a Comment