forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 77
merge amd-staging into amd-feature/wave-transform #606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
cdevadas
merged 953 commits into
amd-feature/wave-transform
from
amd/dev/cdevadas/wave-transform/merge-from-stg-nov-17
Nov 19, 2025
Merged
merge amd-staging into amd-feature/wave-transform #606
cdevadas
merged 953 commits into
amd-feature/wave-transform
from
amd/dev/cdevadas/wave-transform/merge-from-stg-nov-17
Nov 19, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…rand (llvm#161501) InlineAsmLowering rejected inline assembly with memory reference inputs if the values passed to the inline asm weren't pointers. The DAG lowering however handled them just fine. This patch updates InlineAsmLowering to store such values on the stack, and then use the stack pointer as the "indirect" version of the operand.
This commit ensures that gather and scatter operations with int64 index tensors can be created. This aligns with the EXT_INT64 extension.
…S1POE2) (llvm#164912) Add assembly/disassembly support for AArch64 `FEAT_S1POE2` (Stage 1 Permission Overlay Extension 2), as blogged about here: * https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/future-architecture-technologies-poe2-and-vmte and as documented here: * https://developer.arm.com/documentation/109697/2025_09/Future-Architecture-Technologies Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>
) This patch ensures we switch out of streaming mode before TLS-descriptor calls. ZA state will also be preserved when using the new SME ABI lowering (`-aarch64-new-sme-abi`). Fixes llvm#152165
…lvm#163528) The IR verifier will carsh if there is any instructions located before phi-node. The `infer-address-spaces` pass would like to insert `addrspacecast` before phi-node in some corner cases. Indeed, since the operand pointer(phi-node's incoming value) has been determined to `NewAS` by the pass, it is safe to `addrspacecast` it immediately after the position where defined it. Co-authored-by: Kerang Mao <krmao@birentech.com>
This patch reenables tests that had been xfailed in a previous merge (#575) by Ron. 1. test/Lower/OpenMP/DelayedPrivatization/target-private-allocatable.f90 Fixed test to accommodate map_clauses related to descriptors that we do not have upstream. 2. test/Lower/OpenMP/optional-argument-map-2.f90 Same problem as above in addition to a bad merge that enabled some testing that had been deliberately disabled by Andrew on amd-staging in the past. See commit 706196c
…7748) Precommit test fixups for llvm#167113
_mm_sqrt_sh / _mm512_sqrt_ph - these were missed from llvm#167692
) llvm#167489 did not work properly when MLIR inlining is enabled (experimental in flang, enabled with `-mllvm -inline-all`). The reason is that inlining will cause several `fir.dummy_scope` to coexist inside a same `func.func` (`fir.dummy_scope` of inlined `func.func` are preserved in order to preserve the relationship between arguments of the inlined call for better aliasing deductions). After llvm#167489, the debug info pass creates argument debug info for all fir.declare with a fir.dummy_scope. This causes arguments from inlined calls to appear as argument of the procedure where the call was inlined. To avoid this, only consider that fir.declare are arguments of the current function if their fir.dummy_scope is the first one created in the function (fir.dummy_scope cannot be reorder because they have write effects to the debug memory ressource, and the fir.dummy_scope of the current functions is always emitted before any calls are lowered, so before any fir.dummy_scope are inlined).
Just delegate to the subexpr instead for now.
This option has long been replaced by `-fc1 -fdebug-dump-pft`. Removed the old option and updated one test that still used it.
This reverts commit 1a86f0a.
While at it, record VPIRFlags in VPWidenInductionRecipe.
The Transactional Memory Extension (TME) was introduced as part of Armv9-A but has not been adopted by the ecosystem. This mirrors what Arm has observed with similar extensions in other architectures. Therefore, remove FEAT_TME assembly and ACLE code from llvm, because support for TME has now been officially withdrawn, as noted here: ``` FEAT_TME is withdrawn from all future versions of Arm® Architecture Reference Manual for A-profile architecture. ``` referenced in Known Issue D24093, documented here: https://developer.arm.com/documentation/102105/lb-05/
Addresses the issues found on the review of https://github.com/llvm/llvm-project/pull/150267/files#r2356936355 Currently when collecting the users of an IFunc symbol to determine the callers, we incorrectly mix versions of different functions together, alongside non-FMV callers all in the same bag. That is problematic because we incorrectly deduce which features are unavailable as we iterate the callers. I have updated the unit tests to require a resolver function for the callers and regenerated the resolvers since some FMV features have been removed making the detection bitmasks different. I've replaced the deleted FMV feature ls64 with cssc. I've added a new test to cover unrelated callers.
Without this gcc warns like
../../clang/lib/AST/ExprConstant.cpp:4091:63: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
4091 | (SrcVal.isVector() && SrcVal.getVectorLength() == 1) &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
4092 | "Not a valid HLSLAggregateSplatCast.");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
…#128164) If possible, put the definition next to the definition of an adjacent declaration. For example: struct S { void f^oo1() {} void foo2(); void f^oo3() {} }; // S::foo1() goes here void S::foo2() {} // S::foo3() goes here
This directive tells the compiler to ignore vector dependencies in the following loop and it must be placed before a `do loop`. Sometimes the compiler may not have sufficient information to decide whether a particular loop is vectorizable due to potential dependencies between iterations and the directive is here to tell to the compiler that vectorization is safe with `parallelAccesses` metadata. This directive is also equivalent to `#pragma clang loop assume(safety)` in C++
…167941) In PrivatizeAssociatedLoopIndexAndCheckLoopLevel we now descend all the way down the chain of nested OpenMPLoopDirectives ahead of time, so innerMostNest can only be a nullptr, or a DoConstruct.
This reverts commit 469702c. llvm#168048
When calling Block::movePointersTo(), the two blocks might have different metadata sizes, which causes the final pointer to be incorrect and point to garbage. Adjust the pointer base and offset accordingly. Fixes llvm#168018
…m#166603) In true16 mode, D16 insts are lowered to a pseudo t16 first, and then lowered to hi/lo inst in MC lowering using D16T16 table. However, the D16T16 table selects both `flat_load_d16_t16 / flat_load_d16_t16_saddr` to `flat_load_d16_(hi)_b16` which is wrong. saddr pseudo inst `flat_load_d16_t16_saddr` should be selected to saddr hi/lo inst The global/scratch are correct while the flat seems to be the only one with this issue.
overriding for hyderabad ocl fails
overriding for hyderabad ocl fails
…ictable-rand` (llvm#167689) Closes llvm#157292
The compiler should not consider split vectorize nodes, when checking for non-schedulable PHI-based parent nodes. Only pure PHI nodes must be considered, they only can be considered as explicit users, split nodes are not. Fixes llvm#168268
ThisPartition is already of type int. Identified with readability-redundant-casting.
EnableFSDiscriminator is declared in DebugInfoMetadata.h. Identified with readability-redundant-declaration.
Update VPInstruction constructor to delegate to constructor with more comprehensive checking and validation. This required updating some unit tests, to make sure the constructed VPInstructions are valid.
This patch moves initWithExactBucketCount and ExactBucketCount to
DenseMapBase to share more code.
Since SmallDenseMap::allocateBuckets always returns true,
initWithExactBucketCount is equivalent to:
void initWithExactBucketCount(unsigned NewNumBuckets) {
allocateBuckets(NewNumBuckets);
initEmpty();
}
for SmallDenseMap.
Note that ExactBucketCount is not used within DenseMapBase yet.
This moves us closer to the storage policy idea outlined in llvm#168255.
This allows SDNodes to be validated against their expected type profiles and reduces the number of changes required to add a new node. Autogenerated node names start with "AMDGPUISD::", hence the changes in the tests. The few nodes defined in R600.td are *not* imported because TableGen processes AMDGPU.td that doesn't include R600.td. Ideally, we would have two sets of nodes, but that would require careful reorganization of td files since some nodes are shared between AMDGPU/R600. Not sure if it something worth looking into. Some nodes fail validation, those are listed in `AMDGPUSelectionDAGInfo::verifyTargetNode()`. Part of llvm#119709. Pull Request: llvm#168248
…specific relocations to strings. (llvm#168293) This will be used in places like LLD to render them for error messages.
Symbol is already of type MCSymbolELF *. Identified with readability-redundant-casting.
…onversions.cpp (llvm#167261) Calling convention is irrelevant to address space verification and adds complixity for other target triples.
Collaborator
|
PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/10/builds/27 |
jmmartinez
approved these changes
Nov 17, 2025
vg0204
approved these changes
Nov 18, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.