Improving Garbage Collection in .NET

Causes of Memory Leaks in JavaScript

Garbage Collection (GC) is like the unsung hero of .NET's Common Language Runtime (CLR), silently cleaning up after us so we can focus on writing great code. Most of us know it’s there, but let’s be honest, how often do we stop to think about what it’s really doing?

But as our applications grow and start to push the limits, understanding how GC works and how to optimize it can give us a real edge. In this deep dive, we’ll explore how to move beyond the basics and get the most out of .NET’s memory management magic.

Understanding .NET Garbage Collection

Basics of the .NET GC: Generations (Gen 0, Gen 1, Gen 2)

The .NET Garbage Collector (GC) is a memory management system that automatically reclaims unused objects. To optimize performance, it organizes objects into three generations:

  • Gen 0: The starting point for all objects.
  • Gen 1: Objects that survive a Gen 0 garbage collection are promoted to Gen 1.
  • Gen 2: Objects in Gen 1 that survive another garbage collection are promoted to Gen 2.

This generation-based system is built on the assumption that short-lived objects are more common than long-lived ones. By focusing on reclaiming short-lived objects first, the GC improves efficiency. Gen 2 objects, being long-lived, are collected less frequently since collecting them incurs higher overhead.

Gen 0 and Gen 1 are referred to as ephemeral generations due to their short-lived nature. A segment is a contiguous block of memory allocated by the GC to store objects. The GC manages separate segments for ephemeral generations and Gen 2. The ephemeral segment size varies based on system architecture (32-bit or 64-bit) and GC type (Workstation or Server).

Default sizes for the ephemeral segment are shown in the table below.

Workstation/server GC 32-bit 64-bit
Workstation GC 16 MB 256 MB
Server GC 64 MB 4 GB
Server GC 4 logical CPUs 32 MB 2 GB
Server GC 8 logical CPUs 16 MB 1 GB

Types of GCs: Workstation vs. Server GC

The .NET GC operates in two primary modes, configurable based on application needs:

  • Workstation GC: Optimized for single-threaded applications. Provides low-latency garbage collection to maintain responsiveness, ideal for GUI applications.
  • Server GC: Designed for multi-threaded, high-performance applications. Leverages multiple threads for garbage collection, improving throughput.

Developers can configure the GC mode to align with the application's deployment environment and performance goals.

Diagnosing Memory Issues

In order to notice changes made by the Garbage Collector, we can utilize the Microsoft Visual Studio Diagnostic Tools window or tools like dotTrace and dotMemory from JetBrains. For this demonstration, however, we'll stick with Visual Studio Diagnostic Tools. It’s reliable, and hey, it’s already baked into the IDE.

Steps to diagnose memory issues:

For this walkthrough, we’ve prepared a console program designed to simulate a memory leak or excessive allocations.

  • Open project in Visual Studio.
  • Attach your debugger to a running process and open the Diagnostic Tools window if it did not open automatically: Debug > Windows > Show Diagnostic Tools.
  • In the Diagnostic Tools window, ensure the Memory Usage tool is selected under the "Available tools" section. If not listed, click on the gear icon (Settings) and enable Memory Usage.

We’ll start by simulating excessive allocations in the application:

Press the Take Snapshot button at specific key points. Before or after the suspected leaks or after a major operation. Snapshots capture the state of your application's memory at a given point in time.

After we have taken a few snapshots we can compare them. Visual Studio will display the difference in memory usage between the two snapshots, highlighting:

  • Retained objects (potential leaks)
  • Increased allocations (indicative of inefficiency)

Inspect the details further to see object types consuming the most memory or allocation call stacks to trace back to the code causing the issue. If you observe frequent or large allocations look for unnecessary object creation, e.g., temporary objects in tight loops. Watch for patterns like:

  • Too many small allocations (e.g., frequent string concatenations).
  • Large, long-lived objects are not released.

In case of memory leaks look for retained objects that are not expected to persist. Common culprits include:

  • Event handlers or delegates not unsubscribed.
  • Static references or caches not cleared.

Optimizing Object Allocation

Efficient memory management is key to building high-performing applications. In this guide, we’ll look at two powerful techniques: using the ArrayPool<T> class and leveraging the Span<T> struct.

Reducing Allocation Frequency

One major source of memory inefficiency is excessive allocation. Let’s tackle this using ArrayPool<T>, a high-performance API in .NET that provides a pool of reusable byte arrays. Instead of creating a new array every time you need memory, you "rent" one from the pool and "return" it when you’re done.

Before refactoring:

byte[] largeArray = new byte[10 * 1024 * 1024];
for (int i = 0; i < largeArray.Length; i += 1024)
{
  largeArray[i] = 0xFF; // Simulate some usage
}
memoryHog.Add(largeArray);

A new array of size 10MB is allocated in every iteration which consequently adds pressure to the garbage collector and risks an OutOfMemoryException.

After Refactoring:

var pool = ArrayPool<byte>.Shared;
var buffer = pool.Rent(10 * 1024 * 1024); // Rent a 10 MB buffer from the pool

Span<byte> span = buffer.AsSpan();
span.Fill(0xFF); // Use the rented buffer without allocating a new one

memoryHog.Add(buffer);

// Instead of creating a new array, we "rent" one from the shared pool.
// After usage, the buffer can be returned to the pool for reuse.
foreach (var buffer in memoryHog)
{
  pool.Return(buffer); // Return the buffer to the pool for reuse
}

Async/Await and GC Optimization

Efficiently managing asynchronous code not only boosts your app’s responsiveness but also reduces unnecessary memory allocations. Let’s explore key techniques to optimize async/await usage while keeping the Garbage Collector happy.

Avoiding capturing variables in async methods or lambdas

Closures occur when an async method or lambda references variables from its containing method or scope. The compiler creates a state machine that stores these variables on the heap, causing allocations.

In this example, Task.Run captures buffer and size, and because this lambda runs asynchronously, these captured variables are stored in a closure on the heap.

for (int i = 0; i < 1000000; i++)
{
  var size = new Random().Next(1024, 1048576); // Captured variable
  var buffer = new byte[size];

  await Task.Run(() =>
  {
    // Captures 'buffer' and 'size', leading to heap allocations
    Span<byte> span = buffer.AsSpan(0, size);
    for (int j = 0; j < span.Length; j++)
    {
      span[j] = (byte)(j % 256);
    }
  });
}

We can avoid these closures because the work is performed synchronously within the loop instead of passing a lambda to Task.Run, avoiding the capture of variables.

for (int i = 0; i < 1000000; i++)
{
  var size = new Random().Next(1024, 1048576);
  var buffer = new byte[size];

  // Perform work directly without capturing variables
  Span<byte> span = buffer.AsSpan(0, size);
  for (int j = 0; j < span.Length; j++)
  {
    span[j] = (byte)(j % 256);
  }

  if (i % 10000 == 0)
  {
    await Task.Delay(10).ConfigureAwait(false); // Non-blocking delay
  }
}

You might have noticed the usage of ConfigureAwait(false). By default, await captures the synchronization context to resume execution on the same thread. It is often unnecessary and adds overhead. This behavior can be costly in terms of performance and can result in a deadlock on the UI thread. ConfigureAwait(false) tells the runtime that the continuation can run on any available thread, reducing context-switching costs.

Task/ValueTask

Choosing between Task and ValueTask can make a difference in performance-critical code. Here’s a quick guide:

  • Use Task when usability is your priority. It’s widely supported and has better guarantees for error handling.
  • Use ValueTask for performance-sensitive paths, especially in APIs that avoid frequent allocations or are used in hot paths.

Be mindful that ValueTask comes with trade-offs, such as additional complexity when dealing with returned results or exception handling.

Implementing the Disposable pattern

The Disposable pattern is like a clean-up crew for your code, jumping in to clean up unmanaged resources. Think file handles, database connections, or network sockets, things that don’t clean up after themselves. Popular in environments like .NET, this pattern makes sure these resources are promptly and predictably freed, saving your application from mysterious leaks and performance hiccups.

Let’s implement the Disposable pattern into the SimulateMemoryLeakOptimized method. We are going to create a ResourceManager class. Their job is to manage resources responsibly.

class ResourceManager : IDisposable
{
  private readonly ArrayPool<byte> _pool;
  private byte[] _buffer;
  private bool _disposed;

  public ResourceManager(int bufferSize)
  {
    _pool = ArrayPool<byte>.Shared;
    _buffer = _pool.Rent(bufferSize);
  }

  public void UseResource()
  {
    if (_disposed)
    {
      throw new ObjectDisposedException(nameof(ResourceManager));
    }

    Span<byte> span = _buffer.AsSpan();
    span.Fill(0xFF); // Simulate usage
    Console.WriteLine("Resource is in use.");
  }

  // The public Dispose method
  public void Dispose()
  {
    Dispose(true);
    GC.SuppressFinalize(this);
  }

  // The protected Dispose method follows the dispose pattern
  protected virtual void Dispose(bool disposing)
  {
    if (_disposed)
    {
      return;
    }

    if (disposing)
    {
      // Release managed resources if necessary
    }

    // Release unmanaged resources
    if (_buffer != null)
    {
      _pool.Return(_buffer);
      _buffer = null;
    }

    _disposed = true;
  }

  // Finalizer, only if necessary (e.g., for unmanaged resources)
  ~ResourceManager()
  {
    Dispose(false);
  }
}

_buffer holds the rented byte array. It acts as the resource being managed by the class. _disposed is a private boolean flag that tracks whether the object has been disposed of. This prevents accessing or releasing resources multiple times, which could lead to exceptions or undefined behavior. The UseResource method demonstrates how the rented resource can be safely used. It checks whether the object has been disposed of to ensure safe operation.

If disposing is true, it indicates that the method is being called explicitly. Managed resources can also be released here. Regardless of disposal, unmanaged resources (or pooled buffers in this case) are released.

The finalizer is included to handle cases where Dispose is not called explicitly, but it’s avoided unless necessary. If unmanaged resources are held, the finalizer ensures cleanup. However, when the Dispose method is called explicitly, the finalizer is suppressed.

Tuning Garbage Collection Settings

Configuring GC settings in runtimeconfig.json

The runtimeconfig.json file allows you to configure various runtime behaviors, including garbage collection. Enabling Concurrent GC improves application responsiveness by allowing GC to occur on background threads.

{
  "runtimeOptions": {
    "configProperties": {
      "System.GC.Concurrent": true
    }
  }
}

Enabling or disabling server GC for multi-core environments

The Server GC mode is designed for high-throughput, multi-threaded applications running on multi-core systems. It allocates multiple GC threads, one per core, and manages memory more aggressively to minimize pause times.

Server GC can be enabled in the runtimeconfig.json file or via environment variables.

{
  "runtimeOptions": {
    "configProperties": {
      "System.GC.Server": true
    }
  }
}

Adjusting Large Object Heap (LOH) Compaction Strategies

The Large Object Heap (LOH) stores objects that are 85 KB or larger, such as large arrays or buffers. Over time, LOH can become fragmented, leading to inefficient memory usage.

By default, the LOH is not compacted during garbage collection. You can enable compaction in applications where fragmentation becomes an issue:

{
  "runtimeOptions": {
    "configProperties": {
      "System.GC.CompactOnBackground": true
    }
  }
}

For finer control, you can manually trigger a full garbage collection with LOH compaction:

GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced, true, true);

Best Practices for Performance

Optimizing performance often comes down to making thoughtful choices about memory usage, threading, and computation. Let’s explore some practical strategies to avoid common pitfalls and enhance your application’s efficiency.

Avoid frequent boxing/unboxing in Generic types

Boxing occurs when value types (e.g., int, bool) are converted to reference types, leading to heap allocations and garbage collection overhead. To keep your code lean and efficient:

  • Use Generics: Employ generic collections like List<T> or Dictionary<TKey, TValue> to maintain type safety and avoid boxing.
  • Implement Interfaces Carefully: When structs implement interfaces, ensure methods like ToString() and Equals() are overridden to prevent boxing.

Example:

// Avoid boxing
List<int> numbers = new List<int> { 123 }; // No boxing
int firstNumber = numbers[0];

Use ImmutableArray or ImmutableList

Immutable collections are inherently thread-safe and reduce the overhead of synchronization required for mutable collections. They are ideal for data shared across threads or read frequently with rare updates.

Example:

ImmutableList<int> immutableList = ImmutableList.Create(1, 2, 3);
ImmutableList<int> updatedList = immutableList.Add(4);

Real-world Examples

Gaming Frame Rate Optimization

Beyond FMV cutscenes, real-time gameplay in fast-paced video games often requires careful GC control to avoid frame drops. Developers may schedule GC during non-critical gameplay moments, such as loading screens or menus, ensuring smooth gameplay during action sequences.

Real-time Systems

Applications like real-time trading systems or flight control software require predictable performance. GC pauses can introduce latency spikes, making it crucial to use GC-tuned frameworks or explicit memory management.

IoT Devices

Resource-constrained IoT devices often have strict performance requirements. GC pauses must be minimized or eliminated entirely in systems that require consistent data processing, such as medical monitors or industrial sensors.

Scientific Computing and Big Data Analytics

High-performance computing tasks in physics simulations, genome sequencing, or weather modeling often require massive memory allocation. GC pauses in these systems can delay computations or result in failed tasks.

Summary

In conclusion, understanding .NET's Garbage Collection system is important for building high-performing and efficient applications. By leveraging tools, patterns, and configurations such as ArrayPool<T> and the Disposable pattern, developers can minimize memory overhead and improve responsiveness. Thoughtful adjustments to GC settings, combined with diagnostic tools, enable proactive memory management tailored to the developers’ needs. Mastering these techniques not only ensures robust software but also empowers developers to harness the full potential of .NET's memory management capabilities.

Author: