Skip to content

Conversation

@cdevadas
Copy link

No description provided.

Pierre-vh and others added 30 commits November 14, 2025 10:34
…rand (llvm#161501)

InlineAsmLowering rejected inline assembly with memory reference inputs
if the values passed to the inline asm weren't pointers. The DAG
lowering however handled them just fine.

This patch updates InlineAsmLowering to store such values on the stack,
and then use the stack pointer as the "indirect" version of the operand.
This commit ensures that gather and scatter operations with int64 index
tensors can be created. This aligns with the EXT_INT64 extension.
…S1POE2) (llvm#164912)

Add assembly/disassembly support for AArch64 `FEAT_S1POE2` (Stage 1
Permission Overlay Extension 2), as blogged about here:

* https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/future-architecture-technologies-poe2-and-vmte

and as documented here:

* https://developer.arm.com/documentation/109697/2025_09/Future-Architecture-Technologies

Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>
)

This patch ensures we switch out of streaming mode before TLS-descriptor
calls. ZA state will also be preserved when using the new SME ABI
lowering (`-aarch64-new-sme-abi`).

Fixes llvm#152165
…lvm#163528)

The IR verifier will carsh if there is any instructions located before
phi-node. The `infer-address-spaces` pass would like to insert
`addrspacecast` before phi-node in some corner cases. Indeed, since the
operand pointer(phi-node's incoming value) has been determined to
`NewAS` by the pass, it is safe to `addrspacecast` it immediately after
the position where defined it.

Co-authored-by: Kerang Mao <krmao@birentech.com>
This patch reenables tests that had been xfailed in a previous merge
(#575) by Ron.

1. test/Lower/OpenMP/DelayedPrivatization/target-private-allocatable.f90
Fixed test to accommodate map_clauses related to descriptors that we do
not have upstream.
2. test/Lower/OpenMP/optional-argument-map-2.f90 Same problem as above
in addition to a bad merge that enabled some testing that had been
deliberately disabled by Andrew on amd-staging in the past. See commit
706196c
_mm_sqrt_sh / _mm512_sqrt_ph - these were missed from llvm#167692
)

llvm#167489 did not work properly when MLIR inlining is enabled
(experimental in flang, enabled with `-mllvm -inline-all`).

The reason is that inlining will cause several `fir.dummy_scope` to
coexist inside a same `func.func` (`fir.dummy_scope` of inlined
`func.func` are preserved in order to preserve the relationship between
arguments of the inlined call for better aliasing deductions).

After llvm#167489, the debug info pass creates argument debug info for all
fir.declare with a fir.dummy_scope. This causes arguments from inlined
calls to appear as argument of the procedure where the call was inlined.

To avoid this, only consider that fir.declare are arguments of the
current function if their fir.dummy_scope is the first one created in
the function (fir.dummy_scope cannot be reorder because they have write
effects to the debug memory ressource, and the fir.dummy_scope of the
current functions is always emitted before any calls are lowered, so
before any fir.dummy_scope are inlined).
Just delegate to the subexpr instead for now.
This option has long been replaced by `-fc1 -fdebug-dump-pft`. Removed
the old option and updated one test that still used it.
While at it, record VPIRFlags in VPWidenInductionRecipe.
The Transactional Memory Extension (TME) was introduced as part of
Armv9-A but has not been adopted by the ecosystem. This mirrors what
Arm has observed with similar extensions in other architectures.

Therefore, remove FEAT_TME assembly and ACLE code from llvm, because
support for TME has now been officially withdrawn, as noted here:

```
   FEAT_TME is withdrawn from all future versions of Arm®
   Architecture Reference Manual for A-profile architecture.
```

referenced in Known Issue D24093, documented here:
   https://developer.arm.com/documentation/102105/lb-05/
Addresses the issues found on the review of
https://github.com/llvm/llvm-project/pull/150267/files#r2356936355

Currently when collecting the users of an IFunc symbol to determine the
callers, we incorrectly mix versions of different functions together,
alongside non-FMV callers all in the same bag. That is problematic
because we incorrectly deduce which features are unavailable as we
iterate the callers.

I have updated the unit tests to require a resolver function for the
callers and regenerated the resolvers since some FMV features have been
removed making the detection bitmasks different. I've replaced the
deleted FMV feature ls64 with cssc. I've added a new test to cover
unrelated callers.
Without this gcc warns like
 ../../clang/lib/AST/ExprConstant.cpp:4091:63: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
  4091 |          (SrcVal.isVector() && SrcVal.getVectorLength() == 1) &&
       |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
  4092 |              "Not a valid HLSLAggregateSplatCast.");
       |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
…#128164)

If possible, put the definition next to the definition of an adjacent
declaration. For example:

struct S {
  void f^oo1() {}
  void foo2();
  void f^oo3() {}
};

// S::foo1() goes here
void S::foo2() {}
// S::foo3() goes here
This directive tells the compiler to ignore vector dependencies in the
following loop and it must be placed before a `do loop`.

Sometimes the compiler may not have sufficient information to decide
whether a particular loop is vectorizable due to potential dependencies
between iterations and the directive is here to tell to the compiler
that vectorization is safe with `parallelAccesses` metadata.

This directive is also equivalent to `#pragma clang loop assume(safety)`
in C++
…167941)

In PrivatizeAssociatedLoopIndexAndCheckLoopLevel we now descend all the
way down the chain of nested OpenMPLoopDirectives ahead of time, so
innerMostNest can only be a nullptr, or a DoConstruct.
When calling Block::movePointersTo(), the two blocks might have
different metadata sizes, which causes the final pointer to be incorrect
and point to garbage. Adjust the pointer base and offset accordingly.

Fixes llvm#168018
…m#166603)

In true16 mode, D16 insts are lowered to a pseudo t16 first, and then
lowered to hi/lo inst in MC lowering using D16T16 table.

However, the D16T16 table selects both `flat_load_d16_t16 /
flat_load_d16_t16_saddr` to `flat_load_d16_(hi)_b16` which is wrong.
saddr pseudo inst `flat_load_d16_t16_saddr` should be selected to saddr
hi/lo inst

The global/scratch are correct while the flat seems to be the only one
with this issue.
overriding for hyderabad ocl fails
overriding for hyderabad ocl fails
)

And return true. Also make those two functions const.
zeyi2 and others added 20 commits November 16, 2025 22:55
The compiler should not consider split vectorize nodes, when checking
for non-schedulable PHI-based parent nodes. Only pure PHI nodes must be
  considered, they only can be considered as explicit users, split nodes
  are not.

Fixes llvm#168268
ThisPartition is already of type int.

Identified with readability-redundant-casting.
EnableFSDiscriminator is declared in DebugInfoMetadata.h.

Identified with readability-redundant-declaration.
Update VPInstruction constructor to delegate to constructor with more
comprehensive checking and validation.

This required updating some unit tests, to make sure the constructed
VPInstructions are valid.
This patch moves initWithExactBucketCount and ExactBucketCount to
DenseMapBase to share more code.

Since SmallDenseMap::allocateBuckets always returns true,
initWithExactBucketCount is equivalent to:

  void initWithExactBucketCount(unsigned NewNumBuckets) {
    allocateBuckets(NewNumBuckets);
    initEmpty();
  }

for SmallDenseMap.

Note that ExactBucketCount is not used within DenseMapBase yet.

This moves us closer to the storage policy idea outlined in llvm#168255.
This allows SDNodes to be validated against their expected type profiles
and reduces the number of changes required to add a new node.

Autogenerated node names start with "AMDGPUISD::", hence the changes in
the tests.

The few nodes defined in R600.td are *not* imported because TableGen
processes AMDGPU.td that doesn't include R600.td. Ideally, we would have
two sets of nodes, but that would require careful reorganization of td
files since some nodes are shared between AMDGPU/R600. Not sure if it
something worth looking into.

Some nodes fail validation, those are listed in
`AMDGPUSelectionDAGInfo::verifyTargetNode()`.

Part of llvm#119709.

Pull Request: llvm#168248
…specific relocations to strings. (llvm#168293)

This will be used in places like LLD to render them for error messages.
Symbol is already of type MCSymbolELF *.

Identified with readability-redundant-casting.
…onversions.cpp (llvm#167261)

Calling convention is irrelevant to address space verification and adds
complixity for other target triples.
@z1-cciauto
Copy link
Collaborator

@cdevadas cdevadas merged commit 159473b into amd-feature/wave-transform Nov 19, 2025
32 checks passed
@cdevadas cdevadas deleted the amd/dev/cdevadas/wave-transform/merge-from-stg-nov-17 branch November 19, 2025 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.