Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 22, 2025

Overview

This PR addresses a critical GC safety issue where managed objects can be prematurely collected while native code is still operating on their handles, potentially causing crashes or data corruption.

Problem

As described in Chris Brumme's blog post on Lifetime, GC.KeepAlive, handle recycling:

Once we've extracted _handle from this, there are no further uses of this object. In other words, this can be collected even while you are executing an instance method on that object. But what if class C has a Finalize() method which closes _handle? When we call C.OperateOnHandle(), we now have a race between the application and the GC / Finalizer.

The .NET runtime cannot see into native code. Once a handle is extracted from a managed object and passed to a P/Invoke method, the GC may:

  1. Determine the managed object is no longer referenced
  2. Collect and finalize the object (disposing its native handle)
  3. Cause the native code to operate on an invalid/freed handle

This race condition is particularly problematic under GC pressure (mobile devices, memory-constrained environments) and can manifest as intermittent crashes that are difficult to diagnose.

Real-World Example

From the issue description and unoplatform/uno#21660:

var canvas = ...
var picture = ...
canvas.DrawPicture(picture);
// `picture` is not referenced after this point, so GC can collect it
// But native code may still be using picture.Handle!

Solution

Add GC.KeepAlive() calls after P/Invoke methods to ensure reference type parameters remain alive until the native call completes:

// Before
public void DrawPicture(SKPicture picture, SKPaint paint = null)
{
    if (picture == null)
        throw new ArgumentNullException(nameof(picture));
    SkiaApi.sk_canvas_draw_picture(Handle, picture.Handle, null, paint?.Handle ?? IntPtr.Zero);
}

// After
public void DrawPicture(SKPicture picture, SKPaint paint = null)
{
    if (picture == null)
        throw new ArgumentNullException(nameof(picture));
    SkiaApi.sk_canvas_draw_picture(Handle, picture.Handle, null, paint?.Handle ?? IntPtr.Zero);
    GC.KeepAlive(picture);
    GC.KeepAlive(paint);
}

Changes in This PR

Fixed Classes (40+ methods)

  • SKCanvas.cs - All drawing methods: DrawPicture, DrawImage, DrawPath, DrawPaint, DrawRegion, DrawRect, DrawRoundRect, DrawOval, DrawCircle, DrawPoint, DrawPoints, DrawText, DrawDrawable, DrawVertices, DrawArc, DrawRoundRectDifference, DrawAtlas, DrawPatch, DrawImageNinePatch, DrawImageLattice, and all Clip* and annotation methods
  • SKPaint.cs - Constructor with SKFont, property setters (Shader, MaskFilter, ColorFilter, ImageFilter, Blender, PathEffect), SetColor, GetFillPath
  • SKImage.cs (partial) - FromPixelCopy, FromPixels, FromEncodedData, PeekPixels
  • SKBitmap.cs (partial) - ExtractSubset
  • SKPath.cs (partial) - Constructor, AddRoundRect, AddPath (all overloads), AddPathReverse

Documentation

  • GC_KEEPALIVE_IMPLEMENTATION_GUIDE.md - Comprehensive implementation guide with patterns, examples, complete file checklist, and testing strategy
  • analyze_gc_keepalive.sh - Analysis script to identify remaining methods needing fixes

Scope

This issue affects approximately 200+ methods across 56 files in binding/SkiaSharp. This PR implements the fix for the most critical and commonly-used APIs (~20% of total methods), providing:

  • Protection for all core drawing operations
  • A clear pattern for completing the remaining work
  • Tools and documentation to track and implement remaining fixes

Testing

Changes follow the established pattern and are minimally invasive. Each fix:

  • Adds only GC.KeepAlive() calls after existing P/Invoke calls
  • Does not change method signatures or behavior
  • Maintains backward compatibility

Testing should include:

  1. Existing unit test suite (no regressions expected)
  2. GC stress testing under memory pressure
  3. Platform-specific validation (Android, iOS where GC behavior varies)

Related Issues

Remaining Work

The implementation guide (GC_KEEPALIVE_IMPLEMENTATION_GUIDE.md) documents all remaining files and provides a checklist for completing this work. High-priority files still needing fixes include:

  • SKPath.cs (12 more methods)
  • SKImageFilter.cs (28 methods)
  • SKTextBlob.cs (18 methods)
  • SKFont.cs (8 methods)
  • SKShader.cs (13 methods)
  • And 30+ additional files

The pattern is consistent and well-documented, making it straightforward to complete the remaining implementations.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • 0t3vsblobprodcus362.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • 2kmvsblobprodcus39.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • 2zrvsblobprodcus388.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • 7devsblobprodcus323.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • 7tjvsblobprodcus341.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • bcnvsblobprodcus378.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • c78vsblobprodcus322.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • frdvsblobprodcus327.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • ibzvsblobprodcus369.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • imzvsblobprodcus368.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • josvsblobprodcus372.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • kgfvsblobprodcus314.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • kh4vsblobprodcus325.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • kijvsblobprodcus387.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • l49vsblobprodcus358.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • s4uvsblobprodcus326.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • u3hvsblobprodcus371.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)
  • u6ovsblobprodcus377.vsblob.vsassets.io
    • Triggering command: dotnet build -c Release (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>[BUG] P/Invokes should protect parameters across invocations</issue_title>
<issue_description>### Description

Context: https://learn.microsoft.com/en-us/archive/blogs/cbrumme/lifetime-gc-keepalive-handle-recycling
Context: unoplatform/uno#21660
Context: dotnet/java-interop#719

Background: The most important background is Chris Bruce's blog post Lifetime, GC.KeepAlive, handle recycling. Excerpts:

class C {
   IntPtr _handle;
   Static void OperateOnHandle(IntPtr h) { ... }
   void m() {
      OperateOnHandle(_handle);
      ...
   }
   ...
}
 
class Other {
   void work() {
      if (something) {
         C aC = new C();
         aC.m();
         ...  // most guess here
      } else {
         ...
      }
   }
}

It’s more interesting to worry about the earliest point that aC could be collected. …

… Actually, aC could become eligible for collection before C.m() even calls C.OperateOnHandle(). Once we’ve extracted _handle from this, there are no further uses of this object. In other words, this can be collected even while you are executing an instance method on that object.

But what if class C has a Finalize() method which closes _handle? When we call C.OperateOnHandle(), we now have a race between the application and the GC / Finalizer. Eventually, that’s a race we’re going to lose.

Rephrased:

  1. .NET is a multi-threaded environment, and the GC is one of many threads that can mutate your data.
  2. The .NET runtime cannot "see" into native code
  3. Therefore, care must be taken to ensure that we don't allow the GC to collect instances while native code is operating on them.

dotnet/java-interop#719 was one non-Skia example of this.

unoplatform/uno#21660 is a Skia-related example of this. Paraphrasing greatly, we have:

var canvas =var picture = …
canvas.DrawPicture(picture);
// `picture` is not referenced after this point.

While we may wish that the canvas.DrawPicture() call will keep picture alive, it does not. Eventually you hit:

public void DrawPicture (SKPicture picture, SKPaint paint = null)
{
if (picture == null)
throw new ArgumentNullException (nameof (picture));
SkiaApi.sk_canvas_draw_picture (Handle, picture.Handle, null, paint == null ? IntPtr.Zero : paint.Handle);
}

meaning once exeecution hits SkiaApi.sk_canvas_draw_picture(), nothing ensures that picture is kept alive. (Or canvas, for that matter!)

The fix: audit the code (is there a generator?) such that all reference type parameters are explicitly kept alive across P/Invoke boundaries. For example, SKCanvas.DrawPicture() should be updated to be:

partial class SKCanvas {
	public void DrawPicture (SKPicture picture, SKPaint paint = null)
	{
		if (picture == null)
			throw new ArgumentNullException (nameof (picture));
		SkiaApi.sk_canvas_draw_picture (Handle, picture.Handle, null, paint == null ? IntPtr.Zero : paint.Handle);
		GC.KeepAlive (picture);
		GC.KeepAlive (paint);
		// *arguably* GC.KeepAlive(this), but maybe just require that the caller keep it around?
	}
}

This should be done for all public methods that invoke P/Invoke methods.

Code

Expected Behavior

No response

Actual Behavior

No response

Version of SkiaSharp

3.116.0 (Current)

Last Known Good Version of SkiaSharp

Other (Please indicate in the description)

IDE / Editor

Visual Studio Code (macOS)

Platform / Operating System

Android, All

Platform / Operating System Version

No response

Devices

No response

Relevant Screenshots

No response

Relevant Log Output

Code of Conduct

  • I agree to follow this project's Code of Conduct</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #3393


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 5 commits October 22, 2025 02:09
Co-authored-by: mattleibow <1096616+mattleibow@users.noreply.github.com>
Co-authored-by: mattleibow <1096616+mattleibow@users.noreply.github.com>
Co-authored-by: mattleibow <1096616+mattleibow@users.noreply.github.com>
Co-authored-by: mattleibow <1096616+mattleibow@users.noreply.github.com>
…mplementation

Co-authored-by: mattleibow <1096616+mattleibow@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix P/Invokes to protect parameters across invocations [BUG] Add GC.KeepAlive calls to protect P/Invoke parameters from premature collection Oct 22, 2025
Copilot AI requested a review from mattleibow October 22, 2025 02:24
Copilot finished work on behalf of mattleibow October 22, 2025 02:24
}
```

### Property Setters

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand SkiaSharp, but the existence of properties deeply concerns me. Spitballing:

var paint = new SKPaint();
var shader = SKShader.CreateColor(new SKColor(0));
paint.Shader = shader;

What keeps shader alive?

For this specific example, the answer appears to be the SKShader.CreateColor() hits SKShader.GetObject() which hits SKObject.GetOrAddObject() which "registers" instances within HandleDictionary, and as HandleDictionary holds static members, those are all roots which keep things alive.

However, not all T.GetObject() methods use GetOrAddObject(). Consider SKPaint.GetObject(), which just news up a value:

internal static SKPaint GetObject (IntPtr handle) =>
handle == IntPtr.Zero ? null : new SKPaint (handle, true);

Thus:

var bitmap = new SKBitmap();
var canvas = new SKCanvas(bitmap);

var save = new SKCanvasSaveLayerRec();
var paint = new SKPaint();
save.Paint = paint;

canvas.SaveLayer(save);

What keeps the GC from collecting paint "too soon"?

Is there a "too soon"?

public int SaveLayer (in SKCanvasSaveLayerRec rec)
{
var native = rec.ToNative ();
return SkiaApi.sk_canvas_save_layer_rec (Handle, &native);
}

native is from:

internal readonly SKCanvasSaveLayerRecNative ToNative () =>
new SKCanvasSaveLayerRecNative {
fBounds = Bounds is { } bounds ? &bounds : (SKRect*)null,
fPaint = Paint?.Handle ?? IntPtr.Zero,
fBackdrop = Backdrop?.Handle ?? IntPtr.Zero,
fFlags = Flags
};

Thus, my conjecture: nothing references SKCanvasSaveLayerRec.Paint after the .ToNative() invocation, so if we expand and annotate SaveLayer(), we get:

partial class SKCanvas {
	public int SaveLayer (in SKCanvasSaveLayerRec rec)
	{
		var native = rec.ToNative ();
		// Nothing references `rec.Paint` after this point.
		// `rec.Paint` is collected by the GC, *invalidating* SKCanvasSaveLayerRecNative.fPaint
		return SkiaApi.sk_canvas_save_layer_rec (Handle, &native);
		// *boom*, as native code accesses an invalid pointer.
	}
}

Is this likely? No. That's a very narrow window for things to go wrong. (Though adding a Thread.Sleep(1000) between rec.ToNative() and the sk_canvas_save_layer_rec() could make things more likely… 😉)


I am thus rather concerned about the behavior of properties in general. If the property type uses HandleDictionary, things should be fine. If they don't

One git grep -3 'static.*GetObject' later, and the following files are concerning:

  • binding/SkiaSharp.Skottie/Animation.cs
  • binding/SkiaSharp/GRGlInterface.cs
  • binding/SkiaSharp/GRVkExtensions.cs
  • binding/SkiaSharp/SKCodec.cs
  • binding/SkiaSharp/SKDocument.cs
  • binding/SkiaSharp/SKFontStyle.cs
  • binding/SkiaSharp/SKPaint.cs
  • binding/SkiaSharp/SKPath.cs
  • binding/SkiaSharp/SKString.cs
  • binding/SkiaSharp/SKTextBlob.cs
  • binding/SkiaSharp/SKVertices.cs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[BUG] P/Invokes should protect parameters across invocations

3 participants