Stopping Garbage Collection in .NET Core 3.0 (part II)

Let’s see how it’s implemented. For why it is implemented, see Part I.

using System;
using System.Diagnostics.Tracing;
using System.Runtime;

The FxCop code analyzers get upset if I don’t declare this, which also impede me from using unsigned numeral types in interfaces.

[assembly: CLSCompliant(true)]

namespace LNativeMemory
{

The first piece of the puzzle is to implement an event listener. It is a not-obvious (for me) class. I don’t fully understand the lifetime semantics, but the code below seems to do the right thing.

The interesting piece is _started and the method Start(). The constructor for EventListener allocates plenty of stuff. I don’t want to do those allocations after calling TryStartNoGCRegion because they would use part of the GC Heap that I want for my program.

Instead, I create it before such call, but then I make it ‘switch on’ just after the Start() method is called.

    internal sealed class GcEventListener : EventListener
    {
        Action _action;
        EventSource _eventSource;
        bool _active = false;

        internal void Start() { _active = true; }
        internal void Stop() { _active = false; }


As described in part one, you pass a delegate at creation time, which is called when garbage collection is restarted.

        internal GcEventListener(Action action) => _action = action ?? throw new ArgumentNullException(nameof(action));


We register to all the events coming from .NET. We want to call the delegate at the exact point when garbage collection is turned on again.
We don’t have a clean way to do that (aka there is no runtime event we can hook up to, see here, so listening to every single GC event gives us the most chances of doing it right. Also it ties us the least to any pattern of events, which
might change in the future.

        // from https://docs.microsoft.com/en-us/dotnet/framework/performance/garbage-collection-etw-events
        private const int GC_KEYWORD = 0x0000001;
        private const int TYPE_KEYWORD = 0x0080000;
        private const int GCHEAPANDTYPENAMES_KEYWORD = 0x1000000;

        protected override void OnEventSourceCreated(EventSource eventSource)
        {
            if (eventSource.Name.Equals("Microsoft-Windows-DotNETRuntime", StringComparison.Ordinal))
            {
                _eventSource = eventSource;
                EnableEvents(eventSource, EventLevel.Verbose, (EventKeywords)(GC_KEYWORD | GCHEAPANDTYPENAMES_KEYWORD | TYPE_KEYWORD));
            }
        }


For each event, I check if the garbage collector has exited the NoGC region. If it has, then let’s invoke the delegate.

        protected override void OnEventWritten(EventWrittenEventArgs eventData)
        {
            var eventName = eventData.EventName;
            if(_active && GCSettings.LatencyMode != GCLatencyMode.NoGCRegion)
            {
                _action?.Invoke();
            }
        }
    }


Now that we have our event listener, we need to hook it up. The code below implements what I described earlier.
1. Do your allocations for the event listener
2. Start the NoGc region
3. Start monitoring the runtime for the start of the NoGC region

    public static class GC2
    {
        static private GcEventListener _evListener;

        public static bool TryStartNoGCRegion(long totalSize, Action actionWhenAllocatedMore)
        {

            _evListener = new GcEventListener(actionWhenAllocatedMore);
            var succeeded = GC.TryStartNoGCRegion(totalSize, disallowFullBlockingGC: false);
            _evListener.Start();

            return succeeded;
        }


As puzzling as this might be, I provisionally believe it to be correct. Apparently, even if the GC is not in a NoGC region, you still need to call
EndNoGCRegion if you have called TryStartNoGCRegion earlier, otherwise your next call to TryStartNoGCRegion will fail. EndNoGCRegion will throw an exception, but that’s OK. Your next call to TryStartNoGCRegion will now succeed.

Now read the above repeatedly until you got. Or just trust that it works somehow.

        public static void EndNoGCRegion()
        {
            _evListener.Stop();

            try
            {
                GC.EndNoGCRegion();
            } catch (Exception)
            {

            }
        }
    }


This is used as the default behavior for the delegate in the wrapper class below. I was made aware by the code analyzer that I shouldn’t be throwing an OOF exception here. At first, I dismissed it, but then it hit me. It is right.

We are not running out of memory here. We simply have allocated more memory than what we declared we would. There is likely plenty of memory left on the machine. Thinking more about it, I grew ashamed of my initial reaction. Think about a support engineer getting an OOM exception at that point and trying to figure out why. So, always listen to Lint …

    public class OutOfGCHeapMemoryException : OutOfMemoryException {
        public OutOfGCHeapMemoryException(string message) : base(message) { }
        public OutOfGCHeapMemoryException(string message, Exception innerException) : base(message, innerException) { }
        public OutOfGCHeapMemoryException() : base() { }

    }


This is an utility class that implements the IDisposable pattern for this scenario. The size of the default ephemeral segment comes fromhere.

    public sealed class NoGCRegion: IDisposable
    {
        static readonly Action defaultErrorF = () => throw new OutOfGCHeapMemoryException();
        const int safeEphemeralSegment = 16 * 1024 * 1024;

        public NoGCRegion(int totalSize, Action actionWhenAllocatedMore)
        {
            var succeeded = GC2.TryStartNoGCRegion(totalSize, actionWhenAllocatedMore);
            if (!succeeded)
                throw new InvalidOperationException("Cannot enter NoGCRegion");
        }

        public NoGCRegion(int totalSize) : this(totalSize, defaultErrorF) { }
        public NoGCRegion() : this(safeEphemeralSegment, defaultErrorF) { }

        public void Dispose() => GC2.EndNoGCRegion();
    }
}

Stopping Garbage Collection in .NET Core 3.0 (part I)

For how all of the below is implemented, see Part II.

Code at https://github.com/lucabol/LNativeMemory/tree/master/LNativeMemory

Scenario

You have an application or a particular code path of your application that cannot take the pauses that GC creates. Typical examples are real time systems, tick by tick financial apps, embedded systems, etc …

Disclaimer

For any normal kind of applications, YOU DON’T NEED TO DO THIS. You are likely to make your application run slower or blow up memory. If you have an hot path in your application (i.e. you are creating an editor with Intellisense), use the GC latency modes.

Use the code below just under extreme circumstance as it is untested, error prone and wacky. You are probably better off waiting for an official way of doing it (i.e. when this
is implemented)

The problem with TryStartNoGCRegion

There is a GC.TryStartNoGCRegion in .NET. You can use it to stop garbage collection passing a totalBytes parameter that represents the maximum amount of memory that you plan to allocate from the managed heap. Matt describes it here.

The problem is that when/if you allocate more than that, garbage collection resumes silently. Your application continues to work,but with different performance characteristics from what you expected.

The idea

The main idea is to use ETW events to detect when a GC occurs and to call an user provided delegate at that point. You can then do whatever you want in the delegate (i.e. shutdown the process, send email to support, start another NoGC region, etc…).

Also, I have wrapped the whole StartNoGCRegion/EndNoGCRegion in an IDisposable wrapper for easy of use.

The tests

Let’s start by looking at how you use it.

using Xunit;
using System.Threading;

namespace LNativeMemory.Tests
{

    // XUnit executes all tests in a class sequentially, so no problem with multi-threading calls to GC
    public class GC2Tests
    {


We need to use a timer to maximize the chances that a GC happens in some of the tests. Also we allocate an amount that should work in all GC configuration as per the article above. trigger is a static field so as to stay zero-allocation (otherwise the delegate will have to capture the a local trigger variable creating a heap allocated closure). Not that it matters any to be zero-allocation in this test, but I like to keep ClrHeapAllocationAnalyzer happy.

BTW: XUnit executes all tests in a class sequentially, so no problem with multi-threading calls to GC.

        const int sleepTime = 200;
        const int totalBytes = 16 * 1024 * 1024;
        static bool triggered = false;


First we test that any allocation that doesn’t exceed the limit doesn’t trigger the call to action.

        [Fact]
        public void NoAllocationBeforeLimit()
        {
            try
            {
                triggered = false;
                var succeeded = GC2.TryStartNoGCRegion(totalBytes, () => triggered = true);
                Assert.True(succeeded);
                Thread.Sleep(sleepTime);
                Assert.False(triggered);

                var bytes = new byte[99];
                Thread.Sleep(sleepTime);
                Assert.False(triggered);
            }
            finally
            {
                GC2.EndNoGCRegion();
                triggered = false;
            }
        }


Then we test that allocating over the limit does trigger the action. To do so we need to trigger a garbage collection. Our best attempt is with the goofy for loop. If you got a better idea, shout.

        [Fact]
        public void AllocatingOverLimitTriggersTheAction()
        {
            try
            {
                triggered = false;
                var succeeded = GC2.TryStartNoGCRegion(totalBytes, () => triggered = true);
                Assert.True(succeeded);
                Assert.False(triggered);

                for (var i = 0; i < 3; i++) { var k = new byte[totalBytes]; }

                Thread.Sleep(sleepTime);
                Assert.True(triggered);
            }
            finally
            {
                GC2.EndNoGCRegion();
                triggered = false;
            }
        }

We also test that we can go back and forth between starting and stopping without messing things up.

        [Fact]
        public void CanCallMultipleTimes()
        {

            for (int i = 0; i  triggered = true))
            {
                for (var i = 0; i < 3; i++) { var k = new byte[totalBytes]; }
                Thread.Sleep(sleepTime);
                Assert.True(triggered);
                triggered = false;
            }
        }
    }
}