Skip to content

[GPU]: equation of state (EOS) data transfer error in Nvidia 25.5 #31

@marshallward

Description

@marshallward

The equation of state type has an internal allocatable class:

class(EOS_base), allocatable :: type

This class is used to select the EOS type at runtime.

select case (EOS%form_of_EOS)
case (EOS_LINEAR)
allocate(linear_EOS :: EOS%type)
case (EOS_UNESCO)
allocate(UNESCO_EOS :: EOS%type)
case (EOS_WRIGHT)
allocate(buggy_Wright_EOS :: EOS%type)
case (EOS_WRIGHT_FULL)
allocate(Wright_full_EOS :: EOS%type)
case (EOS_WRIGHT_REDUCED)
allocate(Wright_red_EOS :: EOS%type)
case (EOS_JACKETT06)
allocate(Jackett06_EOS :: EOS%type)
case (EOS_TEOS10)
allocate(TEOS10_EOS :: EOS%type)
case (EOS_ROQUET_RHO)
allocate(Roquet_rho_EOS :: EOS%type)
case (EOS_ROQUET_SPV)
allocate(Roquet_SpV_EOS :: EOS%type)
end select

In older Nvidia compilers (~24.x), EOS%type is never transferred, and never contributes to the total size. For the GPU, it is as if it never exists.

The EOS appears in some functions, e.g.

if (present(rho_ref)) then
do concurrent (i=is:ie, j=js:je)
rho(i,j) = density_anomaly_elem_buggy_Wright(this, T(i,j), S(i,j), &
pressure(i,j), rho_ref)
enddo
else
do concurrent (i=is:ie, j=js:je)
rho(i,j) = density_elem_buggy_Wright(this, T(i,j), S(i,j), pressure(i,j))
enddo
endif
end subroutine calculate_density_array_2d_buggy_Wright

but there was no issue in EOS transfers (this in the functions) since it was never used. (A config would probably have failed if it had been using the linear EOS, which does use type, but not our problem at the moment.)


In 25.5, the compiler makes some attempt to include EOS%type in its bookkeeping. However, the exceptional nature of type (as an abstract class which is re-allocated to a concrete type) causes it to be transferred as a zero-size data object.

Somehow, this causes a mismatch in data size, where EOS includes the data in EOS%type in some places, but excludes this data in others.


The issue seems to be that !$omp target enter data map(to: EOS) does not include EOS%type (understandable, since it's allocated). However:

  1. !$omp target enter data map(to: EOS%type) triggers zero-byte operations.
  2. The reported size of EOS includes EOS%type

If there were a meaningful way to transfer EOS%type, then I believe it would resolve the size mismatch, but I haven't yet found a way to fix this.


In the meantime, this could be fixed by simply removing this from the elemental functions inside the do-concurrent loops, and defining the functions with nopass. But I haven't yet explored this option. It would also require an API change which could be disruptive.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions