-
Notifications
You must be signed in to change notification settings - Fork 1
Sample ideas
This article focuses on samples that should be added to XGU.
Dual Paraboloid could be a trade of between quality and performance, in comparison to spheremaps and cubemaps for static environment maps. We might be able to save a lot of memory and computations using dual paraboloid maps for realtime reflections (over cubemaps).
Rendering the maps
Rendering the maps has the drawback of requiring vertex shaders, but should otherwise be straightforward. We might also require an additional clipplane / texture unit. However, we might also be able to resort to hardware clipping, thereby leaving the pixel-processing untouched, for the user to be used.
Using the maps
If we are willing to use more than 1 texture unit, this becomes trivial. Use a separate map for front and back.
If we only want to use 1 texture unit, it should be possible to use a 3D texture which has a depth of 2 if we sacrifice mipmapping.
For mipmapping support, we can potentially use what's proposed in this ImgTec document in the section "2.2.2.Render to Two Faces of a CubeMap Texture". The rest of this section focuses on that technique.
A potential issue with the ImgTec solution is that they do the step
function in the fragment shader which likely isn't possible on original Xbox. So there might be many issues when a polygon is spanning both paraboloids.
As a potential workaround, we could attempt to reduce the Z coordinate to a single bit in the vertex program, to be scaled back in the texture-matrix. The idea here is to severly hurt the interpolation, to get a more binary output. It is not known if this works.
On original Xbox, we also have free control over the memory map, so we could potentially try this to avoid waste of memory (for unused cubemap faces):
uint8_t* sparse = MmAllocateContiguousMemoryEx(0x8000, 0, MAX_RAM, 0, Protect); // Find us 8 pages
MmFreeContiguousMemory(sparse); // Free those 8 pages
MmAllocateContiguousMemoryEx(0x3000, &sparse[0x0], &sparse[0x0]+FIXME, 0, Protect); // Reclaim 3 pages
MmAllocateContiguousMemoryEx(0x2000, &sparse[0x6000], &sparse[0x6000]+FIXME, 0, Protect); // Reclaim 2 pages near the end
Alternatively, to avoid memory issues, interleaving 3 different reflection probes would also be possible to avoid this waste of memory (would still require one texture-stage per lookup).
@JayFoxRox has already started implementation of this technique.
Code for this technique can be found in Appendix A10 of this document.
Spherical harmonics can be implemented on original Xbox using vertex programs. This can be used to avoid consuming a texture unit for low-frequency environment maps. Because of this, this technique also allows to sample more than 4 environments at once.
This is particularly interesting for image based lighting.
Vertex programs can be used to implement a particle-system like that explained in this video (Also another video showing some of the controls)
This would free the CPU for other tasks.
Ideally our implementation of the particle system for the sample would also support precomputed collisions by calculating the time of collision, setting the particle lifetime accordingly, and enabling a particle in another particle system which would reflect the particle with collision response (bouncing / sliding / ...). This pre-computation would probably have to be done on the CPU, but it's hopefully cheap enough to compute.
We can do limited-precision per-particle computations in a register combiner.
The per-pixel processing capabilities of GPUs have successfully been used to simulate water by drawing ripples into a heightmap and processing it through the fragment shader over time. Even for old GPUs like GeForce 3, there has been a sample for doing Conway's "Game of Life" with per-pixel processing.
On the original Xbox, we have the benefit of knowing the GPU formats. So we are able to draw pixels which contain vertex data (Example: RGB = XYZ) without any additional copies for format conversion. Neighboring pixels could be used to add more vertex attributes. Neighbouring channels could be used to encode higher precision attributes. This resulting framebuffer could be used as a vertex-buffer in another frame for rendering pointsprites.
For higher-precision buffers representing particles, this technique is explained in this article (in German).
While the GPU computations should be done in linear-space (and limited to LDR), most textures are stored in sRGB or linear HDR formats. We should provide a sample which shows how to re-encode those input textures to gamma 2.0 (which is GPU compatible), and how to decode them on the GPU. The assumption is that we gain a bit by doing the decoding on the GPU (but this has to be confirmed still).
For encoding textures on the GPU we don't have many options, so we'd likely have to resort to returning linear results. We can try different approximations, but we'd lose more precision because we also have to encode a mapping from our custom format to gamma 2.2 (or whatever the display device expects) in the GPU gamma table.
@JayFoxRox has already started implementation of this technique in https://github.com/JayFoxRox/nxdk/pull/80.
There might be situations where the user wants to invert or post-process their image.
We should have a sample which shows how do arbitrary mappings using a lookup-table with different options that are possible on the GPU. This includes 1D LUTs, 2D LUTs and potentially 3D LUTs.
This is the technique that has been explained in this document (Slide 15-17).
This blur filter has been explained in this article. It was originally developed for original Xbox and is explained in this document (Slide 15-17).
This technique is explained in this article. Some code can be found here.