-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[Soft-Float] - Initial Interpreter Implementation of Ps2's floating point unit specification #12001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting a contribution to PCSX2
As this is your first pull request, please be aware of the contributing guidelines.
Additionally, as per recent changes in GitHub Actions, your pull request will need to be approved by a maintainer before GitHub Actions can run against it. You can find more information about this change here.
Please be patient until this happens. In the meantime if you'd like to confirm the builds are passing, you have the option of opening a PR on your own fork, just make sure your fork's master branch is up to date!
Does this work on the recompilers or just interpreter? |
You should try reading the title. |
96414bd
to
de047ea
Compare
I don’t know why my brain just skimmed over that. |
I ran a bunch of tests for "Test Drive Unlimited" AI during the demo scene after sitting idle on the menu. No combination of settings/interpreters seems to have any effect on behavior. There must be something else going on |
Shouldn't this help with #2990 as well? |
I remember seeing on the public dev channel that stuntman not longer had AI issues with this and not longer the car AI failed? it's been a while |
I tested the demo replays of Tokyo Xtreme Racer Zero (see issue #5597 ) and noticed that the car movement in interpreter mode is now closer to the movement in recompiler mode. comp.mp4 |
The Tony Hawk case is fixed, the game uses an un-documented behaviour in it's 3D engine. The PS2 has no denormals support .... except in the Mul unit apparently. The behaviour is now emulated properly. |
@AmyRoxwell Can you provide a reference to this? There's no indication of it being fixed in #2990, and I assume the devs would have closed it if it were. In any event it involves pathing in Driver 3 as well. |
2024-11-10.14-28-32.mp4This also affects the Fatal Frame 1 issue. Meaning this + the current GameDB patch will end up being the ultimate fix |
This pr's EE interpreter fixes #11636 's gamedb issue. |
I meant like, while using this PR, not that is has been fixed. Sorry if it was misunderstood. But if it's not mention on the PR maybe the thing it needs it's not here by this initial implementation. |
Driv3r seemed fine when it was tested, Stuntman NTSC is a lot better but still can "slightly" deviate. I suspect it is once again, the interpreter rounding/clamping values somewhere. |
@AmyRoxwell Ah, I understand you now. That's great news. @GitHubProUser67 Thanks, nice to see Driv3r is looking better. Would it make sense to list these games in your OP? |
Nice to hear of stuntman finally faring better nowadays. |
Game Constantine, tested US version. On level 2 there is dumpster that we need to climb to progress in game. Recompiler get it fine with positive rounding for EE. This is currently not working with soft floats, no matter what. Reproduction steps.
|
I managed to fix it, but the game requires accurate soft floats on the VU0 and the EE FPU, so far I tried with booth Add/Sub and Mul/Div on EE FPU/COP2 and VU0: Edit: Positive rounding hack-fix it because it makes the float VU0 friendly. In fact the game does some weird EE FPU->VU0 communications with high floats. |
Note to future tests: The constantine case is a typical case of a hack fixing a problem that is related to what I call "float broadcasting". A game might want to transfer a out of IEEE range float from the FPU to the VUs (and vice-versa). When a game Database entry has a rounding mode, usually it covers a can of worms where the game requires accurate soft floats on more than one processor. |
Pretty much every replay for any racing game (including Stuntman and Driv3r) will require tight FPU accuracy. Edit: 23/03/2025 |
Fixes flickering on opponents' cars in Namco's R: Racing Evolution in bumper cam view. EE addition/subtraction and multiplication/division have to be enabled. |
this PR probably renders this obsolete and closed too. #1110 |
Out of curiosity, to any testers: what is the current state of Stuntman with this PR? Has anyone done thorough testing on both the NTSC and PAL versions? If so, can you point out which missions break? |
what should i enable to fix Need for Speed Undercover #9831 bug? |
This pull request will be closed in favor of a newer one since the rebasing process cannot be done. |
just so we can link it the new pr is #12550 |
Creating a new PR causes information to now be split. |
2536b24
to
e8e8d94
Compare
I forgot to mention at the time, but this PR was updated with the changes from #12550 and rebased. Meaning this PR is the updated one. |
46f4624
to
5c5b448
Compare
…terpreters.nit specification. This work is a combination or several efforts and researches done prior. Credits: - https://www.gregorygaines.com/blog/emulating-ps2-floating-point-nums-ieee-754-diffs-part-1/ - https://github.com/GitHubProUser67/MultiServer3/tree/main/BackendServices/PS2FloatLibrary - https://github.com/Goatman13/pcsx2/tree/accurate_int_add_sub - PCSX2 Team for their help and support in this massive journey. Fixes codacity warnings.
Hey, I rebased your PR/removed your merge commit to make it apply cleanly against master :) |
What would I do without you? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Discord, you claimed this was ready for review, but you haven't addressed any of the following things I pointed out in the last review or written anything detailing why you didn't address them.
- BitScanReverse8 is still there
- I mentioned switching option checks to a bitfield last time, and you haven't done that or given a reason why you don't think that's a good idea.
- I asked you to keep PS2Float simple, and leave out things that the emulator already implements accurately. Last time, I requested that you keep it to just add, sub, mul, madd, msub, div, and sqrt, but I'll add the transcendentals into that. But we don't need things like ftoi/itof or comparisons. We also don't need a ToString.
- I questioned having all the functions take a PS2Float, which now includes a bunch of output-only flags. Either make PS2Float not contain output flags, or implement the functions as static methods that take
u32
s, with madd/msub taking an extra boolean indicating incoming overflow. This not only improves performance, but helps make clear what is and isn't needed to people looking at the method signatures. e.g. someone could easily see that the output of add depends only on the two incoming numbers, but madd actually cares about whether the previous number overflowed or not.
In addition, I posted this test elf on Discord before the rewrite, you replied that your C# code correctly handles the broken add result, yet it's still broken after you rewrote the C++ code (along with a lot of the flag tests):
Test Results
[ 1.6582] ADD.S 00000000 + 807FFFFF => 80000000 != 00000000
[ 1.6588] VADD 00000000 + 807FFFFF => 80000000 != 00000000
[ 1.6591] VADD 00000000 + 807FFFFF MAC FLAGS ZS-- != Z---
[ 1.6594] VADD 00800001 + 80800000 MAC FLAGS Z--- != Z-U-
[ 1.6597] VADD {3F800000 7FFFFFFF 00800000 00800001} + {BF800000 FFFFFFFF 80800000 80800000} STATUS FLAGS ZS----------Z----- != ZS--US------Z-U---
[ 1.6599] VADD 80800001 + 00800000 MAC FLAGS ZS-- != ZSU-
[ 1.6603] VADD {BF800000 FFFFFFFF 80800000 80800001} + {3F800000 7FFFFFFF 00800000 00800000} STATUS FLAGS ZSSS--------ZS---- != ZSSSUS------ZSU---
[ 1.6605] VADD 00900001 + 80900000 MAC FLAGS Z--- != Z-U-
[ 1.6607] VADD 00900000 + 808FFFFF MAC FLAGS Z--- != Z-U-
[ 1.6609] VADD 80900001 + 00900000 MAC FLAGS ZS-- != ZSU-
[ 1.6611] VADD 80900000 + 008FFFFF MAC FLAGS ZS-- != ZSU-
[ 1.6615] VADD {00900001 00900000 80900001 80900000} + {80900000 808FFFFF 00900000 008FFFFF} STATUS FLAGS ZSSS--------ZS---- != ZSSSUS------ZSU---
[ 1.6617] SUB.S 00000000 - 007FFFFF => 80000000 != 00000000
[ 1.6620] VSUB 00000000 - 007FFFFF => 80000000 != 00000000
[ 1.6623] VSUB 00000000 - 007FFFFF MAC FLAGS ZS-- != Z---
[ 1.6625] VSUB 00800001 - 00800000 MAC FLAGS Z--- != Z-U-
[ 1.6628] VSUB {3F800000 7FFFFFFF 00800000 00800001} - {3F800000 7FFFFFFF 00800000 00800000} STATUS FLAGS ZS----------Z----- != ZS--US------Z-U---
[ 1.6630] VSUB 80800001 - 80800000 MAC FLAGS ZS-- != ZSU-
[ 1.6633] VSUB {BF800000 FFFFFFFF 80800000 80800001} - {BF800000 FFFFFFFF 80800000 80800000} STATUS FLAGS ZSSS--------ZS---- != ZSSSUS------ZSU---
[ 1.6635] VSUB 00900001 - 00900000 MAC FLAGS Z--- != Z-U-
[ 1.6638] VSUB 00900000 - 008FFFFF MAC FLAGS Z--- != Z-U-
[ 1.6640] VSUB 80900001 - 80900000 MAC FLAGS ZS-- != ZSU-
[ 1.6642] VSUB 80900000 - 808FFFFF MAC FLAGS ZS-- != ZSU-
[ 1.6645] VSUB {00900001 00900000 80900001 80900000} - {00900000 008FFFFF 80900000 808FFFFF} STATUS FLAGS ZSSS--------ZS---- != ZSSSUS------ZSU---
[ 1.6651] VMUL 00800000 * 3F000000 MAC FLAGS Z--- != Z-U-
[ 1.6654] VMUL 80800000 * 3F000000 MAC FLAGS ZS-- != ZSU-
[ 1.6657] VMUL 00800000 * BF000000 MAC FLAGS ZS-- != ZSU-
[ 1.6659] VMUL 80800000 * BF000000 MAC FLAGS Z--- != Z-U-
[ 1.6663] VMUL {00800000 80800000 00800000 80800000} * {3F000000 3F000000 BF000000 BF000000} STATUS FLAGS ZSSS--------ZS---- != ZSSSUS------ZSU---
[ 1.6665] VMUL 20000000 * 1F800000 MAC FLAGS Z--- != Z-U-
[ 1.6667] VMUL A0000000 * 1F800000 MAC FLAGS ZS-- != ZSU-
[ 1.6669] VMUL 20000000 * 9F800000 MAC FLAGS ZS-- != ZSU-
[ 1.6672] VMUL {20000000 A0000000 20000000 A0400000} * {1F800000 1F800000 9F800000 9FC00000} STATUS FLAGS ZSSS--------ZS---- != ZSSSUS------ZSU---
[ 1.6674] VMUL 3F080000 * 00C80000 MAC FLAGS Z--- != Z-U-
[ 1.6677] VMUL {408F0000 40CF0000 3F480000 3F080000} * {7E8FFFFF 7ECFFFFF 00C80000 00C80000} STATUS FLAGS ZS----OS----Z--O-- != ZS--USOS----Z-UO--
[ 1.6680] MADD.S 007FFFFF + 80800000 * 80800000 FLAGS ------------ != SU----------
[ 1.6682] MADD.S BF800000 + 80800000 * 80800000 FLAGS ------------ != SU----------
[ 1.6686] VMADD {00000000 00000000 00000000 00000000} + {00000000 00000000 80000000 80000000} * {00000000 80000000 00000000 80000000} STATUS FLAGS ZS----------Z----- != ZSSS--------Z-----
[ 1.6689] VMADD {00800000 80000000 80000000 007FFFFF} + {80000000 80800000 80000000 80800000} * {80000000 80000000 80800000 80800000} STATUS FLAGS ZS----------Z----- != ZS--US------Z-----
[ 1.6693] VMADD {00800000 3F800000 BF800000 BF800000} + {80000000 80800000 80000000 80800000} * {80000000 80000000 80800000 80800000} STATUS FLAGS --SS---------S---- != ZSSSUS-------S----
[ 1.6695] VMADD 00900001 + A0400000 * 1FC00000 MAC FLAGS Z--- != Z-U-
[ 1.6697] VMADD 808FFFFF + A0400000 * 9FC00000 MAC FLAGS Z--- != Z-U-
[ 1.6700] VMADD 80900000 + A0400000 * 9FC00002 MAC FLAGS Z--- != Z-U-
[ 1.6703] VMADD {00900000 00900001 808FFFFF 80900000} + {A0400000 A0400000 A0400000 A0400000} * {1FC00000 1FC00000 9FC00000 9FC00002} STATUS FLAGS ZS----------Z----- != ZSSSUS------Z-U---
[ 1.6705] VMADD 80900001 + A0400000 * 9FC00000 MAC FLAGS ZS-- != ZSU-
[ 1.6707] VMADD 008FFFFF + A0400000 * 1FC00000 MAC FLAGS ZS-- != ZSU-
[ 1.6710] VMADD 00900000 + A0400000 * 1FC00002 MAC FLAGS ZS-- != ZSU-
[ 1.6713] VMADD {80900000 80900001 008FFFFF 00900000} + {A0400000 A0400000 A0400000 A0400000} * {9FC00000 9FC00000 1FC00000 1FC00002} STATUS FLAGS ZSSS--------ZS---- != ZSSSUS------ZSU---
[ 1.6715] MSUB.S 007FFFFF - 80800000 * 00800000 FLAGS ------------ != SU----------
[ 1.6717] MSUB.S BF800000 - 80800000 * 00800000 FLAGS ------------ != SU----------
[ 1.6725] VMSUB {00000000 00000000 00000000 00000000} - {00000000 00000000 80000000 80000000} * {80000000 00000000 80000000 00000000} STATUS FLAGS ZS----------Z----- != ZSSS--------Z-----
[ 1.6729] VMSUB {807FFFFF 80000000 80000000 007FFFFF} - {80000000 807FFFFF 80000000 807FFFFF} * {00000000 00000000 007FFFFF 007FFFFF} STATUS FLAGS ZS----------Z----- != ZSSS--------Z-----
[ 1.6732] VMSUB {00800000 80000000 80000000 007FFFFF} - {80000000 80800000 80000000 80800000} * {00000000 00000000 00800000 00800000} STATUS FLAGS ZS----------Z----- != ZSSSUS------Z-----
[ 1.6736] VMSUB {00800000 3F800000 BF800000 BF800000} - {80000000 80800000 80000000 80800000} * {00000000 00000000 00800000 00800000} STATUS FLAGS --SS---------S---- != ZSSSUS-------S----
[ 1.6738] VMSUB 00900001 - A0400000 * 9FC00000 MAC FLAGS Z--- != Z-U-
[ 1.6740] VMSUB 808FFFFF - A0400000 * 1FC00000 MAC FLAGS Z--- != Z-U-
[ 1.6742] VMSUB 80900000 - A0400000 * 1FC00002 MAC FLAGS Z--- != Z-U-
[ 1.6745] VMSUB {00900000 00900001 808FFFFF 80900000} - {A0400000 A0400000 A0400000 A0400000} * {9FC00000 9FC00000 1FC00000 1FC00002} STATUS FLAGS ZS----------Z----- != ZSSSUS------Z-U---
[ 1.6747] VMSUB 80900001 - A0400000 * 1FC00000 MAC FLAGS ZS-- != ZSU-
[ 1.6749] VMSUB 008FFFFF - A0400000 * 9FC00000 MAC FLAGS ZS-- != ZSU-
[ 1.6751] VMSUB 00900000 - A0400000 * 9FC00002 MAC FLAGS ZS-- != ZSU-
[ 1.6754] VMSUB {80900000 80900001 008FFFFF 00900000} - {A0400000 A0400000 A0400000 A0400000} * {1FC00000 1FC00000 9FC00000 9FC00002} STATUS FLAGS ZSSS--------ZS---- != ZSSSUS------ZSU---
[ 1.6757] VMSUB {FFFFFFFF 00000000 FFFFFFFF FFFFFFFF} - {FFFFFFFF 7F800000 7FFFFFFF 7FFFFFFF} * {7FFFFFFF FF800000 BF800001 BF800000} STATUS FLAGS ZS----OS----Z--O-- != ZSSS--OS----Z--O--
[ 1.6759] VMSUB {3F800000 00000000 73800000 74000000} - {7FFFFFFF 7FFFFFFF 7FFFFFFF 7FFFFFFF} * {BF800000 BF800000 BF800000 BF800000} STATUS FLAGS ------OS-------O-- != --SS--OS-------O--
if (CHECK_FPU_SOFT_ADDSUB || CHECK_FPU_SOFT_MULDIV || CHECK_FPU_SOFT_SQRT) { _FdValUl_ = PS2Float::Itof(0, _FsValSl_).raw; } | ||
else | ||
{ | ||
_FdValf_ = (float)_FsValSl_; | ||
_FdValf_ = fpuDouble(_FdValUl_); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (CHECK_FPU_SOFT_ADDSUB || CHECK_FPU_SOFT_MULDIV || CHECK_FPU_SOFT_SQRT) { _FdValUl_ = PS2Float::Itof(0, _FsValSl_).raw; } | |
else | |
{ | |
_FdValf_ = (float)_FsValSl_; | |
_FdValf_ = fpuDouble(_FdValUl_); | |
} | |
_FdValf_ = (float)_FsValSl_; |
} | ||
|
||
void CVT_W() { | ||
if ( ( _FsValUl_ & 0x7F800000 ) <= 0x4E800000 ) { _FdValSl_ = (s32)_FsValf_; } | ||
if (CHECK_FPU_SOFT_ADDSUB || CHECK_FPU_SOFT_MULDIV || CHECK_FPU_SOFT_SQRT) { _FdValSl_ = PS2Float::Ftoi(0, _FsValUl_); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already accurate
{ | ||
PS2Float divres = fpuAccurateDiv(_FsValUl_, _FtValUl_); | ||
_FdValUl_ = divres.raw; | ||
if (checkDivideByZeroInvalidSoft(divres, FPUflagD | FPUflagSD, FPUflagI | FPUflagSI)) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why reimplement the div0 check when the existing one is already accurate?
PS2Float::PS2Float(s32 value) | ||
: raw((u32)value) | ||
{} | ||
|
||
PS2Float::PS2Float(u32 value) | ||
: raw(value) | ||
{} | ||
|
||
PS2Float::PS2Float(float value) | ||
: raw(std::bit_cast<u32>(value)) | ||
{} | ||
|
||
PS2Float::PS2Float(bool sign, u8 exponent, u32 mantissa) | ||
: raw((sign ? 1u : 0u) << 31 | | ||
(u32)(exponent << MANTISSA_BITS) | | ||
(mantissa & 0x7FFFFF)) | ||
{} | ||
|
||
PS2Float PS2Float::Max() | ||
{ | ||
return PS2Float(MAX_FLOATING_POINT_VALUE); | ||
} | ||
|
||
PS2Float PS2Float::Min() | ||
{ | ||
return PS2Float(MIN_FLOATING_POINT_VALUE); | ||
} | ||
|
||
PS2Float PS2Float::One() | ||
{ | ||
return PS2Float(ONE); | ||
} | ||
|
||
PS2Float PS2Float::MinOne() | ||
{ | ||
return PS2Float(MIN_ONE); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simple methods like these can go in the header so they're inlinable even if you don't have LTO enabled. Maybe even add a __fi
to them.
PS2Float PS2Float::Sub(PS2Float subtrahend) | ||
{ | ||
if (IsDenormalized() || subtrahend.IsDenormalized()) | ||
{ | ||
bool sign = DetermineSubtractionOperationSign(*this, subtrahend); | ||
|
||
if (IsDenormalized() && !subtrahend.IsDenormalized()) | ||
return PS2Float(sign, subtrahend.Exponent(), subtrahend.Mantissa()); | ||
else if (!IsDenormalized() && subtrahend.IsDenormalized()) | ||
return PS2Float(sign, Exponent(), Mantissa()); | ||
else if (IsDenormalized() && subtrahend.IsDenormalized()) | ||
return PS2Float(sign, 0, 0); | ||
else | ||
Console.Error("Both numbers are not denormalized"); | ||
|
||
return PS2Float(0); | ||
} | ||
|
||
u32 a = raw; | ||
u32 b = subtrahend.raw; | ||
|
||
//exponent difference | ||
s32 exp_diff = Exponent() - subtrahend.Exponent(); | ||
|
||
//diff = 1 .. 24, expt < expd | ||
if (exp_diff > 0 && exp_diff < 25) | ||
{ | ||
exp_diff = exp_diff - 1; | ||
b = (MIN_FLOATING_POINT_VALUE << exp_diff) & b; | ||
} | ||
|
||
//diff = -24 .. -1 , expd < expt | ||
else if (exp_diff < 0 && exp_diff > -25) | ||
{ | ||
exp_diff = -exp_diff; | ||
exp_diff = exp_diff - 1; | ||
a = a & (MIN_FLOATING_POINT_VALUE << exp_diff); | ||
} | ||
|
||
return PS2Float(a).DoAdd(PS2Float(b).Negate()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to the header and implement it as Add(b ^ 0x80000000)
how do you download this and put into pcsx2? |
This isn’t complete nor is it tested. If you were inclined to want to test it you need to go to the “checks” part of this pr. Choose your OS and then download from the artifacts. |
Ok. I'm just not very familiar with GitHub but thanks anyway. |
This Pull Request implements the first take ever on real Soft-Float support in PCSX2.
This work is a combination or several efforts and researches done prior.
Credits:
https://www.gregorygaines.com/blog/emulating-ps2-floating-point-nums-ieee-754-diffs-part-1/
https://github.com/assumenothing/Tommunism.SoftFloat/tree/main
https://github.com/GitHubProUser67/MultiServer3/blob/main/BackendServices/CastleLibrary/EmotionEngine.Emulator/Ps2Float.cs
https://github.com/Goatman13/pcsx2/tree/accurate_int_add_sub
PCSX2 Team for their help and support in this massive journey.
This pull request should be tested with every games requiring a clamping/rounding mode/float patches (cf: GameDatabase).
Currently, this PR fixes on the interpreters:
[BUG]: DJbox (Japan) - SCPS-15082 - 56275BE9 #5169
Ratchet & Clank 2 - Megaturret Doesn't Work Correctly #354
[BUG]: Tourist Trophy: License Tests are broken and not completeable, player is placed out of bounds #11507
[BUG]: Monster Hunter games: bounce calculation inaccuracy #10519
[BUG]: Opening demo desyncs. The Taxi 2 (SLPS-20478) #8068
[BUG]: Pride FC - Fighting Championship | Missing text and textures in Hardware and Software render modes. #7642
[BUG]: Jak & Daxter/Jak 3 -- Character slides on his own | Jak 2 -- Inclined Camera Drift #5257
[BUG]: Final Fantasy X Tidus Falls Through Lift #8228
[REGRESSION]: Final Fantasy X Bosses Turn Invisible During Death Animation #8595
[Feature Request]: Final Fantasy X GameDB Config #10885
[BUG]: Driv3r - Bizarre AI (?) problem: impossible to pass "Lead On Baccus" mission (the 2nd mission) #11714
[BUG]: Camera is static in Prince of Persia: Warrior Within menu #5070
[BUG] World Series of Poker 2008 - Battle for the Bracelets: Broken character geometry #4546
Meta: Games that don't like the Full fpu mode. #2245
[BUG]: FPU rounding issues in Shadow of the Colossus (SCUS-97472) #11528
[BUG]: Tourist Trophy: License Tests are broken and not completeable, player is placed out of bounds #11507
[BUG]: Disney's UP TLB Miss #4760
[BUG]: Need for Speed Undercover not loading save games properly unless they’re created within PCSX2 #9831
Gran Turismo 4 license tests
Mortal Combat Shaolin Monks
Tokyo Xtreme Racer Zero (Accurate Mul/Div for the EE FPU)
Any other games using the NegDiv hack or any other FPU rounding mode.
This sets the floor for Soft-Float in PCSX2, a long awaited contribution.