Stopping Garbage Collection in .NET Core 3.0 (part II)

Let’s see how it’s implemented. For why it is implemented, see Part I.

using System;
using System.Diagnostics.Tracing;
using System.Runtime;

The FxCop code analyzers get upset if I don’t declare this, which also impede me from using unsigned numeral types in interfaces.

[assembly: CLSCompliant(true)]

namespace LNativeMemory

The first piece of the puzzle is to implement an event listener. It is a not-obvious (for me) class. I don’t fully understand the lifetime semantics, but the code below seems to do the right thing.

The interesting piece is _started and the method Start(). The constructor for EventListener allocates plenty of stuff. I don’t want to do those allocations after calling TryStartNoGCRegion because they would use part of the GC Heap that I want for my program.

Instead, I create it before such call, but then I make it ‘switch on’ just after the Start() method is called.

    internal sealed class GcEventListener : EventListener
        Action _action;
        EventSource _eventSource;
        bool _active = false;

        internal void Start() { _active = true; }
        internal void Stop() { _active = false; }

As described in part one, you pass a delegate at creation time, which is called when garbage collection is restarted.

        internal GcEventListener(Action action) => _action = action ?? throw new ArgumentNullException(nameof(action));

We register to all the events coming from .NET. We want to call the delegate at the exact point when garbage collection is turned on again.
We don’t have a clean way to do that (aka there is no runtime event we can hook up to, see here, so listening to every single GC event gives us the most chances of doing it right. Also it ties us the least to any pattern of events, which
might change in the future.

        // from
        private const int GC_KEYWORD = 0x0000001;
        private const int TYPE_KEYWORD = 0x0080000;
        private const int GCHEAPANDTYPENAMES_KEYWORD = 0x1000000;

        protected override void OnEventSourceCreated(EventSource eventSource)
            if (eventSource.Name.Equals("Microsoft-Windows-DotNETRuntime", StringComparison.Ordinal))
                _eventSource = eventSource;
                EnableEvents(eventSource, EventLevel.Verbose, (EventKeywords)(GC_KEYWORD | GCHEAPANDTYPENAMES_KEYWORD | TYPE_KEYWORD));

For each event, I check if the garbage collector has exited the NoGC region. If it has, then let’s invoke the delegate.

        protected override void OnEventWritten(EventWrittenEventArgs eventData)
            var eventName = eventData.EventName;
            if(_active && GCSettings.LatencyMode != GCLatencyMode.NoGCRegion)

Now that we have our event listener, we need to hook it up. The code below implements what I described earlier.
1. Do your allocations for the event listener
2. Start the NoGc region
3. Start monitoring the runtime for the start of the NoGC region

    public static class GC2
        static private GcEventListener _evListener;

        public static bool TryStartNoGCRegion(long totalSize, Action actionWhenAllocatedMore)

            _evListener = new GcEventListener(actionWhenAllocatedMore);
            var succeeded = GC.TryStartNoGCRegion(totalSize, disallowFullBlockingGC: false);

            return succeeded;

As puzzling as this might be, I provisionally believe it to be correct. Apparently, even if the GC is not in a NoGC region, you still need to call
EndNoGCRegion if you have called TryStartNoGCRegion earlier, otherwise your next call to TryStartNoGCRegion will fail. EndNoGCRegion will throw an exception, but that’s OK. Your next call to TryStartNoGCRegion will now succeed.

Now read the above repeatedly until you got. Or just trust that it works somehow.

        public static void EndNoGCRegion()

            } catch (Exception)


This is used as the default behavior for the delegate in the wrapper class below. I was made aware by the code analyzer that I shouldn’t be throwing an OOF exception here. At first, I dismissed it, but then it hit me. It is right.

We are not running out of memory here. We simply have allocated more memory than what we declared we would. There is likely plenty of memory left on the machine. Thinking more about it, I grew ashamed of my initial reaction. Think about a support engineer getting an OOM exception at that point and trying to figure out why. So, always listen to Lint …

    public class OutOfGCHeapMemoryException : OutOfMemoryException {
        public OutOfGCHeapMemoryException(string message) : base(message) { }
        public OutOfGCHeapMemoryException(string message, Exception innerException) : base(message, innerException) { }
        public OutOfGCHeapMemoryException() : base() { }


This is an utility class that implements the IDisposable pattern for this scenario. The size of the default ephemeral segment comes fromhere.

    public sealed class NoGCRegion: IDisposable
        static readonly Action defaultErrorF = () => throw new OutOfGCHeapMemoryException();
        const int safeEphemeralSegment = 16 * 1024 * 1024;

        public NoGCRegion(int totalSize, Action actionWhenAllocatedMore)
            var succeeded = GC2.TryStartNoGCRegion(totalSize, actionWhenAllocatedMore);
            if (!succeeded)
                throw new InvalidOperationException("Cannot enter NoGCRegion");

        public NoGCRegion(int totalSize) : this(totalSize, defaultErrorF) { }
        public NoGCRegion() : this(safeEphemeralSegment, defaultErrorF) { }

        public void Dispose() => GC2.EndNoGCRegion();

Stopping Garbage Collection in .NET Core 3.0 (part I)

For how all of the below is implemented, see Part II.

Code at


You have an application or a particular code path of your application that cannot take the pauses that GC creates. Typical examples are real time systems, tick by tick financial apps, embedded systems, etc …


For any normal kind of applications, YOU DON’T NEED TO DO THIS. You are likely to make your application run slower or blow up memory. If you have an hot path in your application (i.e. you are creating an editor with Intellisense), use the GC latency modes.

Use the code below just under extreme circumstance as it is untested, error prone and wacky. You are probably better off waiting for an official way of doing it (i.e. when this
is implemented)

The problem with TryStartNoGCRegion

There is a GC.TryStartNoGCRegion in .NET. You can use it to stop garbage collection passing a totalBytes parameter that represents the maximum amount of memory that you plan to allocate from the managed heap. Matt describes it here.

The problem is that when/if you allocate more than that, garbage collection resumes silently. Your application continues to work,but with different performance characteristics from what you expected.

The idea

The main idea is to use ETW events to detect when a GC occurs and to call an user provided delegate at that point. You can then do whatever you want in the delegate (i.e. shutdown the process, send email to support, start another NoGC region, etc…).

Also, I have wrapped the whole StartNoGCRegion/EndNoGCRegion in an IDisposable wrapper for easy of use.

The tests

Let’s start by looking at how you use it.

using Xunit;
using System.Threading;

namespace LNativeMemory.Tests

    // XUnit executes all tests in a class sequentially, so no problem with multi-threading calls to GC
    public class GC2Tests

We need to use a timer to maximize the chances that a GC happens in some of the tests. Also we allocate an amount that should work in all GC configuration as per the article above. trigger is a static field so as to stay zero-allocation (otherwise the delegate will have to capture the a local trigger variable creating a heap allocated closure). Not that it matters any to be zero-allocation in this test, but I like to keep ClrHeapAllocationAnalyzer happy.

BTW: XUnit executes all tests in a class sequentially, so no problem with multi-threading calls to GC.

        const int sleepTime = 200;
        const int totalBytes = 16 * 1024 * 1024;
        static bool triggered = false;

First we test that any allocation that doesn’t exceed the limit doesn’t trigger the call to action.

        public void NoAllocationBeforeLimit()
                triggered = false;
                var succeeded = GC2.TryStartNoGCRegion(totalBytes, () => triggered = true);

                var bytes = new byte[99];
                triggered = false;

Then we test that allocating over the limit does trigger the action. To do so we need to trigger a garbage collection. Our best attempt is with the goofy for loop. If you got a better idea, shout.

        public void AllocatingOverLimitTriggersTheAction()
                triggered = false;
                var succeeded = GC2.TryStartNoGCRegion(totalBytes, () => triggered = true);

                for (var i = 0; i < 3; i++) { var k = new byte[totalBytes]; }

                triggered = false;

We also test that we can go back and forth between starting and stopping without messing things up.

        public void CanCallMultipleTimes()

            for (int i = 0; i  triggered = true))
                for (var i = 0; i < 3; i++) { var k = new byte[totalBytes]; }
                triggered = false;

A Stack data structure implementation using Span

I am back in Microsoft and today we talk about the code below, which is on github here:

  1. public ref struct SpanStack<T>
  2. {
  3.     private Span memory;
  4.     private int index;
  5.     private int size;
  6.     public SpanStack(Span mem) { memory = mem; index = 0; size = mem.Length; }
  7.     public bool IsEmpty() => index < 0;
  8.     public bool IsFull() => index > size – 1;
  9.     public void Push(T item) => memory[index++] = item;
  10.     public T Pop() => memory[–index];
  11. }
  12. public static class SpanExtensions
  13. {
  14.     public static SpanStack AsStack<T>(this Span span) => new SpanStack(span);
  15. }

This Stack data structure can be used over memory that resides on the stack, heap or unmanaged heap. If you know about Span this should immediately make sense to you.

This has to be a ref struct because it contains a Span. It can’t be used on the heap (i.e. in lambdas, async, class field, …). You have to build it on top of Memory if you need that. Also, you can happily blow the stack with this guy …

Let’s micro-benchmark it with BenchmarkDotNet.  For example, a postfix calculator. Let’s first do it naively using inheritance and the generic Stack class in the framework.

This is the naive object model:

  1. abstract class Token {}
  2. sealed class Operand: Token
  3. {
  4.     public int Value { get; }
  5.     public Operand(int v) { Value = v; }
  6. }
  7. abstract class Operator: Token {
  8.     abstract public int Calc(int a, int b);
  9. }
  10. sealed class Add: Operator
  11. {
  12.     public override int Calc(int a, int b) => a + b;
  13. }
  14. sealed class Mult : Operator
  15. {
  16.     public override int Calc(int a, int b) => a * b;
  17. }
  18. sealed class Minus : Operator
  19. {
  20.     public override int Calc(int a, int b) => a – b;
  21. }

Let’s then do it trying to be a bit more performance aware using a stack friendly representation:

  1. public enum TokenType { Operand, Sum, Mult, Minus}
  2. readonly struct SToken
  3. {
  4.     public TokenType Type { get; }
  5.     public int Value { get; }
  6.     public SToken(TokenType t, int v) { Type = t; Value = v; }
  7.     public SToken(TokenType t) { Type = t; Value = 0; }
  8.     public int Calc(int a, int b) =>
  9.                Type == TokenType.Sum   ? a + b :
  10.                Type == TokenType.Minus ? a – b :
  11.                Type == TokenType.Minus ? a * b :
  12.                throw new Exception(“I don’t know that one”);
  13. }

Perhaps not overtly elegant, but not that terrible either. You got to love those expression bodied methods and throw-expression.

We then setup things (I know I could/should parse a string here):

  1. static Token[] tokens;
  2. static SToken[] stokens;
  3. [GlobalSetup]
  4. public void Setup()
  5. {
  6.     tokens = new Token[] { new Operand(2), new Operand(3), new Operand(4), new Add(),
  7.                            new Mult(), new Operand(5), new Minus() };
  8.     stokens = new SToken[] { new SToken(TokenType.Operand, 2),
  9.                              new SToken(TokenType.Operand, 3), new SToken(TokenType.Operand, 4),
  10.                              new SToken(TokenType.Sum),  new SToken(TokenType.Mult),
  11.                              new SToken(TokenType.Operand, 5), new SToken(TokenType.Minus)};
  12. }

And first test the naive object model with the standard Stack from System.Collections.Generic.

  1. [Benchmark]
  2. public int PostfixEvalStack()
  3. {
  4.     var stack = new Stack(100);
  5.     foreach (var token in tokens)
  6.     {
  7.         switch (token)
  8.         {
  9.             case Operand t:
  10.                 stack.Push(t);
  11.                 break;
  12.             case Operator o:
  13.                 var a = stack.Pop() as Operand;
  14.                 var b = stack.Pop() as Operand;
  15.                 var result = o.Calc(a.Value, b.Value);
  16.                 stack.Push(new Operand(result));
  17.                 break;
  18.         }
  19.     }
  20.     return (stack.Pop() as Operand).Value;
  21. }

Then let’s just swap out our own lean-and-mean stack:

  1. [Benchmark]
  2. public int PostfixEvalSpanStack()
  3. {
  4.     Span span = new Token[100];
  5.     var stack = span.AsStack();
  6.     foreach (var token in tokens)
  7.     {
  8.         switch (token)
  9.         {
  10.             case Operand t:
  11.                 stack.Push(t);
  12.                 break;
  13.             case Operator o:
  14.                 var a = stack.Pop() as Operand;
  15.                 var b = stack.Pop() as Operand;
  16.                 var result = o.Calc(a.Value, b.Value);
  17.                 stack.Push(new Operand(result));
  18.                 break;
  19.         }
  20.     }
  21.     return (stack.Pop() as Operand).Value;
  22. }

And finally let’s go the whole way, lean object model and lean data structure, everything on the stack:

  1. [Benchmark(Baseline = true)]
  2. public int PostfixEvalSpanStackStructTypes()
  3. {
  4.     Span span = stackalloc SToken[100];
  5.     var stack = span.AsStack();
  6.     foreach (var token in stokens)
  7.     {
  8.         if (token.Type == TokenType.Operand)
  9.         {
  10.             stack.Push(token);
  11.         } else {
  12.             var a = stack.Pop();
  13.             var b = stack.Pop();
  14.             var result = token.Calc(a.Value, b.Value);
  15.             stack.Push(new SToken(TokenType.Operand, result));
  16.             break;
  17.         }
  18.     }
  19.     return stack.Pop().Value;
  20. }

We also want to check that we didn’t code anything stupid and finally run the benchmark.

  1. static void Test()
  2. {
  3.     var p = new Program();
  4.     p.Setup();
  5.     Trace.Assert(p.PostfixEvalStack() == p.PostfixEvalSpanStack() &&
  6.                  p.PostfixEvalSpanStack() == p.PostfixEvalSpanStackStructTypes());
  7. }
  8. static void Main(string[] args)
  9. {
  10.     Test();
  11.     var summary = BenchmarkRunner.Run();
  12. }

On my machine I get these results:

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.16299.431 (1709/FallCreatorsUpdate/Redstone3)
Intel Core i7-6600U CPU 2.60GHz (Skylake), 1 CPU, 4 logical and 2 physical cores
Frequency=2742185 Hz, Resolution=364.6727 ns, Timer=TSC
.NET Core SDK=2.1.300-rc1-008673
  [Host]     : .NET Core 2.1.0-rc1 (CoreCLR 4.6.26426.02, CoreFX 4.6.26426.04), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0-rc1 (CoreCLR 4.6.26426.02, CoreFX 4.6.26426.04), 64bit RyuJIT
Method Mean Error StdDev Scaled ScaledSD
PostfixEvalSpanStackStructTypes 76.24 ns 1.563 ns 2.857 ns 1.00 0.00
PostfixEvalSpanStack 168.65 ns 5.280 ns 15.319 ns 2.22 0.22
PostfixEvalStack 334.56 ns 7.387 ns 20.593 ns 4.39 0.31

Your mileage might vary. I want to emphasize that I am just playing with things. I haven’t done any deep analysis of this benchmark. There can be flaws, etc… etc…

Still, I find the idea of data structures which are memory-location-independent rather fascinating.