How Garbage Collector Works in Java

In this article, we have covered the working of garbage collector in depth. Also, covered the GC algorithm, memory types in a java application and discussed about the generational strategy of a Garbage Collector.

Before going deeper into garbage collector working, we need to first understand the basics of memory in a java application.

Types of Memory in a Java Application

Let’s talk a bit about the different types of memory available to us for use for any java application.

These are the 3 types of memory in Java:

Stack
Heap
Metaspace

Stack: Each thread has its own Stack. Stack can store only simple data structures like: primitive data types. To store complex data types like: objects, we need Heap.

Heap: We have only one Heap area for the entire java application and that area is shared among all the threads.

Metaspace: It contains the metadata related to classes, methods etc. which method is compiled to bytecode and which is to native machine code. Also, all the static variables defined in the application are stored in here in the metaspace. It has the same role as stack, the only difference is : it contains the static variables. static primitive variables stored entirely in metaspace and static complex objects are stored in heap and the corresponding references are stored in metaspace.

Variables get popped out from the stack once they are out of the scope. But in the case of metaspace the static variables remain present for ever. Since, static variables can be accessed anytime. Similarly, the objects present in heap pointed by metaspace won’t be collected by garbage collector ever.

As there will be multiple stacks (corresponding to each thread) in a java application and the access to a stack will be restricted to the thread who owns that stack. But in case of metaspace, it is accessible by all the threads and classes.

Since all the classes and threads has access to the metaspace, that’s why static variables are accessible from any part of the code and any thread in the application.

Metaspace is added in java 8 so it is present in java 8 and above. For java 7 and earlier versions, there was PermGen, which does not exist in later versions now. it is replaced with the metaspace.

So, there is no need to manage metaspace as it will remain as long as the application is running. And on the other side, the scope of stack will be restricted to the threads life duration. The stack is created when a new thread is created and ends with the end of that thread. Also, when the scope is out of a particular method then the corresponding variables stored in the stack will also get popped out.So, there is no need of memory management for stack also.

Heap is the only part which requires proper management, if it is not managed properly then our application won’t work properly and it can even crash abruptly.

To manage heap memory, java provides garbage collector, which manages heap memory its own. The developer don’t need to release the space explicitly after the usage. Java’s garbage collector does that work on behalf of us.

What does Garbage collector do?

Garbage collector frees the memory from heap area when that particular memory location can not be used further in the application. For that it checks for eligible objects to be collected as garbage.

Eligible Objects for Garbage Collection

All the unreachable objects (from stack and metaspace) are eligible for garbage collection.

Please note here that we should never say that unreferenced object are eligible for garbage collection.

Since, an object which is being referenced can still be eligible for garbage collection. Let’s understand this in the following examples:

Example 1: We have a list of Books containing 3 objects of type Book, so the list is pointing to 3 Book objects in heap. Now let’s say we have lost the reference for this list. So, this collection is eligible for collection. Also, the Book objects are also garbage and available for collection. Here, the Book objects are being referenced by the list object but still they are eligible since they are unreachable from stack/metaspace.

Example 2: The circular references are always eligible for collection if neither of them is accessible from stack or metaspace. like: obj1 -> obj2 -> obj3 -> obj1. So, all 3 are garbage objects.

Hence, all objects are eligible for collection which are unreachable from stack and metaspace.

Garbage Collection Algorithm

Impact on our regular application when GC is running:

– it requires some cpu resources to GC when it runs, so in that duration less cpu resources will be available for our application.

Instead of looking for all the objects which need to be removed from the heap, GC looks for the objects need to be retained there. For this, GC uses “mark and sweep” algorithm.

This algorithm works in two steps:

Marking
Sweeping

Marking: In this step, GC marks all the objects which are being referenced from stack and metaspace area. it is also called “stop the world” event. As, it stops all the threads until it marks all the objects.

Our application gets impacted whenever this part runs, since this stops all the threads. So, this step should not take much time and should be running in an optimised way. To run this in optimised way, the heap area is divided into different generations. About which we will talk in the next section of this article.

Sweeping: After marking, GC does a full scan to the heap area and the not-marked objects are freed up. Making the memory free out of such objects is called sweeping the memory. And the marked objects (live objects) are shifted in such a way so that they are in a contiguous memory location.

Ques: What are the benefits of contiguous memory location?

Ans: This helps to avoid the heap to become fragmented over time. Which makes easier and quicker for virtual machine to find and allocate memory for a new object.

NOTE: The garbage collection will be very fast if there is too much of garbage, since the marking phase will be finished very quickly.

What is Generational Garbage Collector

Before understanding generational garbage collector and the working of the garbage collector, we need to note following 2 points here:

Garbage collector’s marking step will be fast when there is too much of garbage.
It has been observed that most java objects don’t live for long. if an object survives, then it is likely to live forever.

If there are too many live objects while GC is running (means, there is too less garbage), then because of ‘stop the world’ behaviour of the GC, our application will get hang.

To avoid the application hang issue, the above two points are taken into consideration for GC implementation. And, therefore generational GC has been introduced in Java.

In broader aspect, heap is divided into 2 parts: young generation and old generation. The size of young generation is kept very small compare to old generation. Now, since most of the objects die soon so in young generation, when GC will apply marking in young generation, very less objects will be in live state and most of them will be in dead state(ie. garbage). So, GC will end up marking the live objects very fast and will not impact our application much.

Working of Garbage Collection

Young generation is even further divided into 3 subparts:

eden space
s0 space (From Space)
s1 space (To Space)

All new objects are put in eden space, As soon as eden space gets full, minor GC runs and all live objects are moved to s0 with compaction(allocating contiguous memory). After completion of GC, the eden space becomes empty.

Now when eden space is full again then garbage collector runs for both eden and s0 and all the live objects of eden are s0 are put into s1 this time (with compaction). Until this point, s1 was empty. After this collection, again eden space is empty to store new objects.

Next time, eden and s1 live objects will be moved to s0. This process goes on like this alternatively.

Ques: How the above process is helping make the collection faster?
Ans:
Reason 1: Since the eden space is less so it will get full very frequently and the GC will clean up it frequently. Also, since most of the objects will be dead so marking step will be very fast.

Reason 2: Compacting step will also be fast, since while compaction, GC will have always either of one survivor space as empty (either s0 or s1). So, it doesn’t need to search for any contiguous location. It just needs to store the remaining live objects in the surviver spaces.

Drawback of above:

Either s0 or s1 will always be empty. This part of the memory won’t be utilised ever and will remain unused to store objects in both of these memory location at once. This is a minor tradeoff against the performance improvement it is providing.

Each time, when objects are being moved either from s0 to s1 or from s1 to s0, they get one generation older. After getting a certain generation older, these objects are moved to the old generation.

When old generation space is full , then major GC takes place.

We should tune our application such that we can minimise the major GC runs.

The GC running in young generation is called minor GC and GC running in old generation is called major GC.

Ques: Why minor GC is fast compare to major GC?
Ans: Since, young generation is very small in size and that too mostly full of garbage.

Invoking the Garbage Collector Explicitly

Using gc() method of System class , we can suggest to virtual machine to run the garbage collector. But, it is not a command which will immediately trigger the GC. It just notifies/suggests the vm to run GC. On top of that, it even does not give us any guarantee that GC will actually run.

Types of Garbage Collector

Majorly there are 4 types of GC:

Serial GC
Parallel GC
CMS (Concurrent Mark Sweep) GC
G1 GC (Garbage First GC)

We will talk about these in detail in the coming further articles.