merge amd-staging into amd-feature/wave-transform #606

cdevadas · 2025-11-17T11:03:08Z

No description provided.

…rand (llvm#161501) InlineAsmLowering rejected inline assembly with memory reference inputs if the values passed to the inline asm weren't pointers. The DAG lowering however handled them just fine. This patch updates InlineAsmLowering to store such values on the stack, and then use the stack pointer as the "indirect" version of the operand.

This commit ensures that gather and scatter operations with int64 index tensors can be created. This aligns with the EXT_INT64 extension.

…S1POE2) (llvm#164912) Add assembly/disassembly support for AArch64 `FEAT_S1POE2` (Stage 1 Permission Overlay Extension 2), as blogged about here: * https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/future-architecture-technologies-poe2-and-vmte and as documented here: * https://developer.arm.com/documentation/109697/2025_09/Future-Architecture-Technologies Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>

) This patch ensures we switch out of streaming mode before TLS-descriptor calls. ZA state will also be preserved when using the new SME ABI lowering (`-aarch64-new-sme-abi`). Fixes llvm#152165

…lvm#163528) The IR verifier will carsh if there is any instructions located before phi-node. The `infer-address-spaces` pass would like to insert `addrspacecast` before phi-node in some corner cases. Indeed, since the operand pointer(phi-node's incoming value) has been determined to `NewAS` by the pass, it is safe to `addrspacecast` it immediately after the position where defined it. Co-authored-by: Kerang Mao <krmao@birentech.com>

This patch reenables tests that had been xfailed in a previous merge (#575) by Ron. 1. test/Lower/OpenMP/DelayedPrivatization/target-private-allocatable.f90 Fixed test to accommodate map_clauses related to descriptors that we do not have upstream. 2. test/Lower/OpenMP/optional-argument-map-2.f90 Same problem as above in addition to a bad merge that enabled some testing that had been deliberately disabled by Andrew on amd-staging in the past. See commit 706196c

…7748) Precommit test fixups for llvm#167113

_mm_sqrt_sh / _mm512_sqrt_ph - these were missed from llvm#167692

) llvm#167489 did not work properly when MLIR inlining is enabled (experimental in flang, enabled with `-mllvm -inline-all`). The reason is that inlining will cause several `fir.dummy_scope` to coexist inside a same `func.func` (`fir.dummy_scope` of inlined `func.func` are preserved in order to preserve the relationship between arguments of the inlined call for better aliasing deductions). After llvm#167489, the debug info pass creates argument debug info for all fir.declare with a fir.dummy_scope. This causes arguments from inlined calls to appear as argument of the procedure where the call was inlined. To avoid this, only consider that fir.declare are arguments of the current function if their fir.dummy_scope is the first one created in the function (fir.dummy_scope cannot be reorder because they have write effects to the debug memory ressource, and the fir.dummy_scope of the current functions is always emitted before any calls are lowered, so before any fir.dummy_scope are inlined).

Just delegate to the subexpr instead for now.

This option has long been replaced by `-fc1 -fdebug-dump-pft`. Removed the old option and updated one test that still used it.

This reverts commit 1a86f0a.

While at it, record VPIRFlags in VPWidenInductionRecipe.

The Transactional Memory Extension (TME) was introduced as part of Armv9-A but has not been adopted by the ecosystem. This mirrors what Arm has observed with similar extensions in other architectures. Therefore, remove FEAT_TME assembly and ACLE code from llvm, because support for TME has now been officially withdrawn, as noted here: ``` FEAT_TME is withdrawn from all future versions of Arm® Architecture Reference Manual for A-profile architecture. ``` referenced in Known Issue D24093, documented here: https://developer.arm.com/documentation/102105/lb-05/

Addresses the issues found on the review of https://github.com/llvm/llvm-project/pull/150267/files#r2356936355 Currently when collecting the users of an IFunc symbol to determine the callers, we incorrectly mix versions of different functions together, alongside non-FMV callers all in the same bag. That is problematic because we incorrectly deduce which features are unavailable as we iterate the callers. I have updated the unit tests to require a resolver function for the callers and regenerated the resolvers since some FMV features have been removed making the detection bitmasks different. I've replaced the deleted FMV feature ls64 with cssc. I've added a new test to cover unrelated callers.

Without this gcc warns like ../../clang/lib/AST/ExprConstant.cpp:4091:63: warning: suggest parentheses around '&&' within '||' [-Wparentheses] 4091 | (SrcVal.isVector() && SrcVal.getVectorLength() == 1) && | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ 4092 | "Not a valid HLSLAggregateSplatCast."); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

…#128164) If possible, put the definition next to the definition of an adjacent declaration. For example: struct S { void f^oo1() {} void foo2(); void f^oo3() {} }; // S::foo1() goes here void S::foo2() {} // S::foo3() goes here

This directive tells the compiler to ignore vector dependencies in the following loop and it must be placed before a `do loop`. Sometimes the compiler may not have sufficient information to decide whether a particular loop is vectorizable due to potential dependencies between iterations and the directive is here to tell to the compiler that vectorization is safe with `parallelAccesses` metadata. This directive is also equivalent to `#pragma clang loop assume(safety)` in C++

…167941) In PrivatizeAssociatedLoopIndexAndCheckLoopLevel we now descend all the way down the chain of nested OpenMPLoopDirectives ahead of time, so innerMostNest can only be a nullptr, or a DoConstruct.

This reverts commit 469702c. llvm#168048

Ref: https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170

When calling Block::movePointersTo(), the two blocks might have different metadata sizes, which causes the final pointer to be incorrect and point to garbage. Adjust the pointer base and offset accordingly. Fixes llvm#168018

…m#166603) In true16 mode, D16 insts are lowered to a pseudo t16 first, and then lowered to hi/lo inst in MC lowering using D16T16 table. However, the D16T16 table selects both `flat_load_d16_t16 / flat_load_d16_t16_saddr` to `flat_load_d16_(hi)_b16` which is wrong. saddr pseudo inst `flat_load_d16_t16_saddr` should be selected to saddr hi/lo inst The global/scratch are correct while the flat seems to be the only one with this issue.

overriding for hyderabad ocl fails

) And return true. Also make those two functions const.

…ictable-rand` (llvm#167689) Closes llvm#157292

The compiler should not consider split vectorize nodes, when checking for non-schedulable PHI-based parent nodes. Only pure PHI nodes must be considered, they only can be considered as explicit users, split nodes are not. Fixes llvm#168268

ThisPartition is already of type int. Identified with readability-redundant-casting.

EnableFSDiscriminator is declared in DebugInfoMetadata.h. Identified with readability-redundant-declaration.

Update VPInstruction constructor to delegate to constructor with more comprehensive checking and validation. This required updating some unit tests, to make sure the constructed VPInstructions are valid.

This patch moves initWithExactBucketCount and ExactBucketCount to DenseMapBase to share more code. Since SmallDenseMap::allocateBuckets always returns true, initWithExactBucketCount is equivalent to: void initWithExactBucketCount(unsigned NewNumBuckets) { allocateBuckets(NewNumBuckets); initEmpty(); } for SmallDenseMap. Note that ExactBucketCount is not used within DenseMapBase yet. This moves us closer to the storage policy idea outlined in llvm#168255.

…#168292) Reverts llvm#167909

This allows SDNodes to be validated against their expected type profiles and reduces the number of changes required to add a new node. Autogenerated node names start with "AMDGPUISD::", hence the changes in the tests. The few nodes defined in R600.td are *not* imported because TableGen processes AMDGPU.td that doesn't include R600.td. Ideally, we would have two sets of nodes, but that would require careful reorganization of td files since some nodes are shared between AMDGPU/R600. Not sure if it something worth looking into. Some nodes fail validation, those are listed in `AMDGPUSelectionDAGInfo::verifyTargetNode()`. Part of llvm#119709. Pull Request: llvm#168248

…specific relocations to strings. (llvm#168293) This will be used in places like LLD to render them for error messages.

Symbol is already of type MCSymbolELF *. Identified with readability-redundant-casting.

…onversions.cpp (llvm#167261) Calling convention is irrelevant to address space verification and adds complixity for other target triples.

z1-cciauto · 2025-11-17T11:03:35Z

PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/10/builds/27

Pierre-vh and others added 30 commits November 14, 2025 10:34

[mlir][tosa] Allow int64 index tensors in gather/scatter (llvm#167894)

8723fe5

This commit ensures that gather and scatter operations with int64 index tensors can be created. This aligns with the EXT_INT64 extension.

[AArch64][SME] Handle SME state around TLS-descriptor calls (llvm#155608

c021e16

) This patch ensures we switch out of streaming mode before TLS-descriptor calls. ZA state will also be preserved when using the new SME ABI lowering (`-aarch64-new-sme-abi`). Fixes llvm#152165

[AMDGPU] Make use of getFunction and getMF. NFC. (llvm#167872)

72c69ae

[LoopInterchange] Fix tests with loops that have BTC=0. NFC. (llvm#16…

00000dc

…7748) Precommit test fixups for llvm#167113

merge main into amd-staging

a5d2b37

[clang][x86] AVX512FP16 constrained sqrt test coverage (llvm#168046)

5829bb9

_mm_sqrt_sh / _mm512_sqrt_ph - these were missed from llvm#167692

[clang][bytecode] Don't outright reject dynamic casts (llvm#167517)

746e5d0

Just delegate to the subexpr instead for now.

[flang] Removed old option -fdebug-dump-pre-fir (llvm#168008)

056f744

This option has long been replaced by `-fc1 -fdebug-dump-pft`. Removed the old option and updated one test that still used it.

[Offload] Add device info for shared memory (llvm#167817)

bc8423d

Revert "[Offload] Add device info for shared memory (llvm#167817)"

89671db

This reverts commit 1a86f0a.

[VPlan] Expand WidenInt inductions with nuw/nsw (llvm#163538)

355e0f9

While at it, record VPIRFlags in VPWidenInductionRecipe.

merge main into amd-staging

d0d0205

[flang][OpenMP] Remove dead code left over after PR143715, NFC (llvm#…

928393b

…167941) In PrivatizeAssociatedLoopIndexAndCheckLoopLevel we now descend all the way down the chain of nested OpenMPLoopDirectives ahead of time, so innerMostNest can only be a nullptr, or a DoConstruct.

Revert "[LICM] Sink unused l-invariant loads in preheader. llvm#157559"

7ee0e0f

This reverts commit 469702c. llvm#168048

[X86][clang-cl] Add AVX10.1/2 to CL option /arch (llvm#167111)

85ba53b

Ref: https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170

merge main into amd-staging (#590)

5caa3f6

overriding for hyderabad ocl fails

merge main into amd-staging (#591)

6b9911f

overriding for hyderabad ocl fails

[clang][bytecode][NFC] Check pointer types in canClassify() (llvm#168069

94c751d

) And return true. Also make those two functions const.

zeyi2 and others added 20 commits November 16, 2025 22:55

[clang-tidy] Rename cert-msc30-c and cert-msc50-cpp to `misc-pred…

8b55d77

…ictable-rand` (llvm#167689) Closes llvm#157292

[gn build] Port 8b55d77

7761a89

merge main into amd-staging

f612d34

[Scalar] Remove a redaundant cast (NFC) (llvm#168284)

977d6cf

ThisPartition is already of type int. Identified with readability-redundant-casting.

[CodeGen] Remove a redundant declaration (NFC) (llvm#168285)

98d49d5

EnableFSDiscriminator is declared in DebugInfoMetadata.h. Identified with readability-redundant-declaration.

[VPlan] Delegate to other VPInstruction constructors. (NFCI)

67c8e38

Update VPInstruction constructor to delegate to constructor with more comprehensive checking and validation. This required updating some unit tests, to make sure the constructed VPInstructions are valid.

merge main into amd-staging (#603)

d56ec74

Revert "DAG: Allow select ptr combine for non-0 address spaces" (llvm…

6d5f87f

…#168292) Reverts llvm#167909

merge main into amd-staging

e54b9c0

Cleanup revert_patches.txt : select

0c89c0a

[Object] Add getRISCVVendorRelocationTypeName to render RISCV vendor-…

cce852f

…specific relocations to strings. (llvm#168293) This will be used in places like LLD to render them for error messages.

[MC] Remove a redundant cast (NFC) (llvm#168298)

11278cf

Symbol is already of type MCSymbolELF *. Identified with readability-redundant-casting.

[NFC][Clang][Test] Drop calling convention check from address-space-c…

99630eb

…onversions.cpp (llvm#167261) Calling convention is irrelevant to address space verification and adds complixity for other target triples.

merge main into amd-staging (#604)

967d5ce

merge main into amd-staging

b849e7c

merge main into amd-staging (#605)

524efbe

merge amd-staging into amd-feature/wave-transform

22739f7

cdevadas requested review from jmmartinez and vg0204 November 17, 2025 11:03

cdevadas requested review from b-sumner, david-salinas and lamb-j as code owners November 17, 2025 11:03

jmmartinez approved these changes Nov 17, 2025

View reviewed changes

vg0204 approved these changes Nov 18, 2025

View reviewed changes

cdevadas merged commit 159473b into amd-feature/wave-transform Nov 19, 2025
32 checks passed

cdevadas deleted the amd/dev/cdevadas/wave-transform/merge-from-stg-nov-17 branch November 19, 2025 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge amd-staging into amd-feature/wave-transform #606

merge amd-staging into amd-feature/wave-transform #606

Uh oh!

cdevadas commented Nov 17, 2025

Uh oh!

z1-cciauto commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

114 participants

merge amd-staging into amd-feature/wave-transform #606

merge amd-staging into amd-feature/wave-transform #606

Uh oh!

Conversation

cdevadas commented Nov 17, 2025

Uh oh!

z1-cciauto commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

114 participants