-
Notifications
You must be signed in to change notification settings - Fork 4
Robust GPU error recovery [Windows] #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: stable
Are you sure you want to change the base?
Conversation
… thread and not a window. Defined error codes so that matoya clients can action differently based on the error received while still masking the internal error details.
3e74bc7 to
c014061
Compare
|
it occurs to me that maybe there is something much simpler that solves the problem at hand: instead of a queue of MTY_Error, you may really want a bitmask, with one bit for each possible MTY_Error. if there are 64 or fewer error types, this fits into a uint64_t. it can be tested by ANDing the uint64_t with (1ULL << error_code), etc. this is because you probably don't care exactly what order the errors occurred in, but you probably do care about which combination of 2+ errors occurred. |
d2450da to
b4f1412
Compare
… matoya client is informed that textures need to be resubmitted.
…g texture for the id. It only does so if the new texture successfully was created.
| if (!cmn->gfx_ui) | ||
| cmn->gfx_ui = GFX_UI_API[cmn->api].create(device); | ||
|
|
||
| if (!cmn->ui_textures) | ||
| if (!cmn->gfx_ui) { | ||
| MTY_HashDestroy(&cmn->ui_textures, NULL); | ||
|
|
||
| } else if (!cmn->ui_textures) { | ||
| cmn->ui_textures = MTY_HashCreate(0); | ||
| } | ||
|
|
||
| return cmn->gfx_ui != NULL; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the gfx_ui state could not be created from the given device, then we want to clean out the hash of textures.
| MTY_Device *device = gfx_ctx_get_device(cmn); | ||
| if (!device) | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid a bunch of error scenarios by not allowing the function to proceed if the graphics context doesn't have a good device handle. This scenario occurs when the GPU resets.
|
|
||
| if (r) { | ||
| void *texture = MTY_HashPopInt(cmn->ui_textures, id); | ||
| GFX_UI_API[cmn->api].destroy_texture(cmn->gfx_ui, &texture, device); | ||
| void *texture = MTY_HashGetInt(cmn->ui_textures, id); | ||
|
|
||
| if (rgba) { | ||
| void *texture_new = GFX_UI_API[cmn->api].create_texture(cmn->gfx_ui, device, rgba, width, height); | ||
|
|
||
| if (rgba) | ||
| texture = GFX_UI_API[cmn->api].create_texture(cmn->gfx_ui, device, rgba, width, height); | ||
| if (texture_new) { | ||
| GFX_UI_API[cmn->api].destroy_texture(cmn->gfx_ui, &texture, device); | ||
|
|
||
| if (texture) | ||
| MTY_HashSetInt(cmn->ui_textures, id, texture); | ||
| texture = texture_new; | ||
| MTY_HashSetInt(cmn->ui_textures, id, texture_new); | ||
| } | ||
|
|
||
| } else { | ||
| GFX_UI_API[cmn->api].destroy_texture(cmn->gfx_ui, &texture, device); | ||
| MTY_HashPopInt(cmn->ui_textures, id); | ||
| } | ||
|
|
||
| r = texture != NULL; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old logic is to first destroy the existing texture for id and get rid of it from the textures hash, and then carry on with creating the new texture with the given pixel data rgba if it was provide, finally updating the hash node for this id with the new texture handle.
The problem with this is that when create_texture fails and/or returns a null texture handle, the existing perfectly-fine texture is removed. This failure can occur when the GPU resets, and because the entry is removed from the hash indiscriminately, MTY_WindowHasUITexture thereafter always returns false for the given id, which is incorrect behavior.
My fix is to replace the old/existing texture with the new one IFF the new texture was successfully created. After this change, even when the GPU resets, textures are rendered properly because the hash still maintains a reference to the original valid handle.
| MTY_ContextState MTY_WindowGetContextState(MTY_App *app, MTY_Window window) | ||
| { | ||
| return MTY_CONTEXT_STATE_NORMAL; | ||
| MTY_ContextState state = MTY_CONTEXT_STATE_NORMAL; | ||
|
|
||
| struct window_common *cmn = mty_window_get_common(app, window); | ||
| if (cmn && cmn->api != MTY_GFX_NONE && cmn->device_changed) { | ||
| state = MTY_CONTEXT_STATE_NEW; | ||
| cmn->device_changed = false; | ||
| } | ||
|
|
||
| return state; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the device has changed or been reset, it is a mistake not to inform clients of matoya that textures need to be resubmitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes in this file implement a more robust GPU error handling. This portion concerns the main graphics context state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes in this file implement a more robust GPU error handling. This portion concerns state pertaining to rendering UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A high-level-ish review given that I don't actually have experience dealing with DirectX
| if (texture_new) { | ||
| GFX_UI_API[cmn->api].destroy_texture(cmn->gfx_ui, &texture, device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if texture is NULL before destroying it. An up-front check is ideal, even if the individual APIs may check too
| if (r) | ||
| MTY_Log("d3d11_ctx reinit ok."); | ||
| else | ||
| MTY_Log("d3d11_ctx reinit FAILED!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No braces, and I'm not certain we need to log two times every time
|
|
||
| int32_t mty_d3d11_error_handler_default(int32_t e1, int32_t e2, void *opaque) | ||
| { | ||
| e1; e2; opaque; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually put each of these on a different line
This PR introduces the ability for users of libmatoya to query for errors and handle them accordingly. Currently, only GPU errors are being reported.The proof-of-concept is done. What is pending is to clean up the implementation.Improves on the way GPU errors are handled while drawing quads or UI. Matoya now handles these cases gracefully and resets the pipeline seamlessly in these scenarios.
Additionally, a few fixes were made as they were discovered while implementing error recovery.
WINDOWS-ONLY for now d3d11 and d3d12
Work In-Scope or Completed
gfxmodules that add better recovery in the event of fatal GPU errors.MTY_WindowSetUITexture. Now the existing texture for the givenidis only deleted if the new texture was successfully created. (Discovered this bug that was causing log-spam when GPU was reset while the app window was minimized to tray).MTY_WindowGetContextStateto consider device resets as deeming the window's graphics context state asMTY_CONTEXT_STATE_NEWso that matoya clients will know to resubmit textures when GPU errors/resets take place.