Skip to content

Conversation

@skc7
Copy link

@skc7 skc7 commented Nov 9, 2025

This PR introduces a new pass "lower-workdistribute" Fortran array statements are lowered to fir as fir.do_loop unordered. "lower-workdistribute" pass works mainly on identifying "fir.do_loop unordered" that is nested in target{teams{workdistribute{fir.do_loop unordered}}} and lowers it to
target{teams{parallel{wsloop{loop_nest}}}}. It hoists all the other ops outside target region. Relaces heap allocation on target with omp.target_allocmem and deallocation with omp.target_freemem from host. Also replaces runtime function "Assign" with omp.target_memcpy from host.

This pass implements following rewrites and optimisations:

  • FissionWorkdistribute: finds the parallelizable ops within teams {workdistribute} region and moves them to their own teams{workdistribute} region.
  • WorkdistributeRuntimeCallLower: finds the FortranAAssign calls nested in teams {workdistribute{}} and lowers it to unordered do loop if src is scalar and dest is array. Other runtime calls are not handled currently.
  • WorkdistributeDoLower: finds the fir.do_loop unoredered nested in teams {workdistribute{fir.do_loop unoredered}} and lowers it to teams {parallel { distribute {wsloop {loop_nest}}}}.
  • TeamsWorkdistributeToSingle: hoists all the ops inside teams {workdistribute{}} before teams op.

The work in this PR is C-P and updated from @ivanradanov commits from coexecute implementation:

flang_workdistribute_iwomp_2024

Paper related to this work by @ivanradanov "Automatic Parallelization and OpenMP Offloadingof Fortran Array
Notation"

This PR introduces a new pass "lower-workdistribute"
Fortran array statements are lowered to fir as fir.do_loop unordered.
"lower-workdistribute" pass works mainly on identifying "fir.do_loop
unordered" that is nested in target{teams{workdistribute{fir.do_loop
unordered}}} and lowers it to
target{teams{parallel{wsloop{loop_nest}}}}. It hoists all the other ops
outside target region. Relaces heap allocation on target with
omp.target_allocmem and deallocation with omp.target_freemem from host.
Also replaces runtime function "Assign" with omp.target_memcpy from
host.

This pass implements following rewrites and optimisations:

- **FissionWorkdistribute**: finds the parallelizable ops within teams
{workdistribute} region and moves them to their own
teams{workdistribute} region.
- **WorkdistributeRuntimeCallLower**: finds the FortranAAssign calls
nested in teams {workdistribute{}} and lowers it to unordered do loop if
src is scalar and dest is array. Other runtime calls are not handled
currently.
- **WorkdistributeDoLower**: finds the fir.do_loop unoredered nested in
teams {workdistribute{fir.do_loop unoredered}} and lowers it to teams
{parallel { distribute {wsloop {loop_nest}}}}.
- **TeamsWorkdistributeToSingle**: hoists all the ops inside teams
{workdistribute{}} before teams op.

The work in this PR is C-P and updated from @ivanradanov commits from
coexecute implementation:

[flang_workdistribute_iwomp_2024](https://github.com/ivanradanov/llvm-project/commits/flang_workdistribute_iwomp_2024)

Paper related to this work by @ivanradanov ["Automatic Parallelization
and OpenMP Offloadingof Fortran Array
Notation"](https://www.osti.gov/servlets/purl/[2449728](https://www.osti.gov/servlets/purl/2449728))
@skc7 skc7 requested a review from SyamaAmd November 9, 2025 12:37
@z1-cciauto
Copy link
Collaborator

@skc7 skc7 changed the title Implement workdistribute construct lowering (#140523) [SWDEV-531975] Implement workdistribute construct lowering (#140523) Nov 9, 2025
@skc7 skc7 requested a review from dpalermo November 9, 2025 12:38
@skc7 skc7 marked this pull request as ready for review November 11, 2025 09:28
@skc7
Copy link
Author

skc7 commented Nov 12, 2025

PSDB has passed.
All the commits related to this feature are already in amd-mainline. This PR is pending to be merged.

@dpalermo Could you please approve?

@skc7 skc7 merged commit 8e85e31 into amd-mainline Nov 14, 2025
11 checks passed
@skc7 skc7 deleted the amd/dev/skc7/amd-mainline/workdistribute_PR12345_new branch November 14, 2025 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants