merge main into amd-staging #621

ronlieb · 2025-11-19T00:46:35Z

No description provided.

This patch fixes most of the ASan tests that were failing on Darwin when running under the internal shell. There are still a couple left that are more interesting cases that I'll do in a follow up patch. The tests that still need to be done: ``` TestCases/Darwin/duplicate_os_log_reports.cpp TestCases/Darwin/dyld_insert_libraries_reexec.cpp TestCases/Darwin/interface_symbols_darwin.cpp ``` Reviewers: thetruestblue, fhahn, vitalybuka, DanBlackwell, ndrewh Reviewed By: DanBlackwell Pull Request: llvm#168545

Only the fortran source files in flang/test/Lower/PowerPC and some in flang/test/Lower have been modified. The other files in the directory will be cleaned up in subsequent commits

…ific address spaces (llvm#167770) For some backends, e.g., BPF, it is desirable to only sanitize memory belonging to specific address spaces. More specifically, it is sometimes desirable to only apply address sanitization for arena memory belonging to address space 1. However, AddressSanitizer currently does not support selectively sanitizing address spaces. Add a new option to select which address spaces to apply AddressSanitizer to. No functional change for existing targets (namely AMD GPU) that hardcode which address spaces to sanitize

In this PR we are proposing to change LLDB codebase so that LLDB is able to print values of integer registers that have more than 64-bits (even if the number of bits is not equal to 128). --------- Co-authored-by: Matej Košík <matej.kosik@codasip.com> Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>

…to non-vectors (llvm#168081) Updates the demanded elements before recursing through copies in case the type of the source register changes from a non-vector register to a vector register. Fixes llvm#167842.

) * original change llvm#162730 * with windows fix llvm#164843 * remove timeout that was pointed out in the comment above * Remove test that starts and listens on a socket to avoid timeout issues

…vm#168165) and make (llvm#165264) Truely recover Executor::getDefaultExecutor. The previous change missed std::unique_ptr, which is needed in a normal program exit, since only with that ThreadPoolExecutor destructor will be called in a normal program exit, where it ensures the executor has been stopped and waits for worker threads to finish. The wait is important as it prevents intermittent crashes on Windows when the process is doing a full exit.

In line with a std proposal to introduce std::clmul, and in preparation to introduce a clmul intrinsic, implement carry-less multiply primitives for APIntOps, clmul[rh]. Ref: https://isocpp.org/files/papers/P3642R3.html

…vm#167575)

Identified with modernize-loop-convert.

https://alive2.llvm.org/ce/z/YGT5SN https://alive2.llvm.org/ce/z/PVDxCw https://alive2.llvm.org/ce/z/8buR2N This is tricky because with positive numbers, we only go up, so we can in fact always hit the signed_max boundary. This is important because the intrinsic we use has the behavior of going the OTHER way, aka clamp to INT_MIN if it goes in that direction. And the range checking we do only works for positive numbers. Because of this issue, we can only do this for constants as well.

When building just the runtimes (eg a patch only touches compiler-rt), we do not actually run any normal check targets. This ends up causing an empty ninja invocation, which builds more targets than necessary. Gate the ninja build for normal check-* targets under an if statement to fix this.

The AArch64 backend converts trees formed by conjunctions/disjunctions of comparisons into sequences of `CCMP` instructions. The implementation before this change checks whether a sub-tree must be processed first. If not, it processes the operations in the order they occur in the DAG. This may not be optimal if there is a corresponding `SUB` node for one of the comparisons. In this case, we should process this comparison first because we can then use the same instruction for the `SUB` node and the comparison. To achieve this, this commit comprises the following changes: - Extend `canEmitConjunction` with a new output parameter `PreferFirst`, which reports to the caller whether the sub-tree should preferably be processed first. - Set `PreferFirst` to `true` if we can find a corresponding `SUB` node in the DAG. - If we can process a sub-tree with `PreferFirst = true` first (i.e., we do not violate any `MustBeFirst` constraint by doing so), we swap the sub-trees. - The already existing code for performing the common subexpression elimination takes care to use only a single instruction for the comparison and the `SUB` node if possible. Closes llvm#149685.

Pull Request: llvm#168209

…37170) In general, "Flat instructions look at the per-workitem address and determine for each work item if the target memory address is in global, private or scratch memory." (RDNA2 ISA) That means that FLAT instructions need to be considered for VMEM hazards even without "specific segment". Also, LDS DMA should be considered for LDS hazard detection. See also llvm#137148

…m#168549) Move `GetInnermostExecPart` and `IsStrictlyStructuredBlock` from Semantics/openmp-utils.* to Parser/openmp-utils.*. These two only depend on the AST contents and properties.

This reverts commit b3d6264. This broke the workflow because the sync-labels flag was set to a zero-length string to work around an issue. The underlying issue has been fixed and the value is now required to be a boolean. We can just drop the value because we want the default behavior anyways. This should be the last remaining breaking change from v5 that we need to migrate.

This reverts commit bd8c941. This still broke things and evidently needs more testing on a fork before relanding. https://github.com/llvm/llvm-project/actions/runs/19475911086

…t could be converted to vector loads plus shuffles (llvm#168571) This is turning up in some legalisation code when shuffling vectors bitcast from illegal loads. Ideally we'd handle more complex shuffles, but reverse is a start.

Reverts llvm#164356 The bots are broken.

They were still running because the conditional was not correct. This patch fixes that so they do not interefere with the results of the job.

These are more idiomatic in bash.

This allows SDNodes to be validated against their expected type profiles and reduces the number of changes required to add a new node. Some nodes fail validation, those are enumerated in `ARMSelectionDAGInfo::verifyTargetNode()`. Some of the bugs are easy to fix, but probably they should be fixed separately, this patch is already big. Part of llvm#119709. Pull Request: llvm#168212

…vm#168559) Fixes the assertion in llvm#168523 This patch lifts the small, odd-sized integer to 8 bits, ensuring that the following lowering code behaves correctly.

This patch adds `LLDBLog::InstrumentationRuntime` as a log channel to provide an appropriate channel for instrumentation runtime plugins as previously one did not exist. A small use of the channel is added to illustrate its use. The logging added is not intended to be comprehensive. This is primarily motivated by an `-fbounds-safety` instrumentation plugin (swiftlang#11835). rdar://164920875

…y-Name Lookups (llvm#168143) This PR adds some test coverage for `StableDirs` during by-name lookups.

…vm#167904) Always rely on local scopes to enforce the lifetime of these helper objects and by extension where the "closing" of various C++ code constructs happens.

…to interleave3-8. (llvm#168473)

…lvm#167745) If vector-unaligned-mem support is not enabled, we should not generate loads/stores that are not aligned to their element size. We already do this for non-VP vector loads/stores. This code has been in our downstream for about a year and a half after finding the vectorizer generating misaligned loads/stores. I don't think that is unique to our downstream. Doing this for masked vp.load/store requires widening the mask as well which is harder to do. NOTE: Because we have to scale the VL, this will introduce additional vsetvli and the VL optimizer will not be effective at optimizing any arithmetic that is consumed by the store.

This test was failing on chromium builds with error: ``` /Volumes/Work/s/w/ir/x/w/llvm_build/bin/llc -o - /Volumes/Work/s/w/ir/x/w/llvm-llvm-project/llvm/test/DebugInfo/AArch64/instr-ref-target-hooks-sp-clobber.mir -run-pass=livedebugvalues | /Volumes/Work/s/w/ir/x/w/llvm_build/bin/FileCheck /Volumes/Work/s/w/ir/x/w/llvm-llvm-project/llvm/test/DebugInfo/AArch64/instr-ref-target-hooks-sp-clobber.mir # RUN: at line 8 + /Volumes/Work/s/w/ir/x/w/llvm_build/bin/llc -o - /Volumes/Work/s/w/ir/x/w/llvm-llvm-project/llvm/test/DebugInfo/AArch64/instr-ref-target-hooks-sp-clobber.mir -run-pass=livedebugvalues + /Volumes/Work/s/w/ir/x/w/llvm_build/bin/FileCheck /Volumes/Work/s/w/ir/x/w/llvm-llvm-project/llvm/test/DebugInfo/AArch64/instr-ref-target-hooks-sp-clobber.mir error: YAML:121:3: unknown key 'stackSizePPR' stackSizePPR: 0 ^~~~~~~~~~~~ FileCheck error: '<stdin>' is empty. FileCheck command line: /Volumes/Work/s/w/ir/x/w/llvm_build/bin/FileCheck /Volumes/Work/s/w/ir/x/w/llvm-llvm-project/llvm/test/DebugInfo/AArch64/instr-ref-target-hooks-sp-clobber.mir ``` This is an attempt to reland the failing test

LLVM IR verifier checks for `extraData` in debug info metadata. This is a follow-up PR based on discussions in llvm#165023

Currently the test cfi-multiple-location.mir is marked as XFAIL. This causes failures on some build bots because the test unexpectedly passes. Mark this test as UNSUPPORTED for now. Later I plan to merge an MR which fixes an issue in CFIInstrInserter and this test will be enabled.

Fixes unsigned int underflows in `MFMASmallGemmSingleWaveOpt::applyIGLPStrategy`.

…lvm#165375) Identity masks can be treated as free when scalable vectorization is possible making the check agnostic of the vectorization policy fixed/scalable, This allows for aggressive vector combines for identity shuffle masks.

This should really check if the libcall is known supported. For now mips doesn't configure its RuntimeLibcallsInfo correctly, and does not have any of the mips16 calls in it. For now there isn't a way to add them without triggering conflicting cases in tablegen, so keep parsing the raw name as it was before.

) Test that -fsanitize=alloc-token is compatible with kcfi and memtag, as these should also be possible to combine. NFC.

We previously added support for marking GlobalOp operations as constant, but the handling to actually do so was left mostly unimplemented. This fills in the missing pieces.

…vm#168584) For vectors, CTLZ, CTTZ, CTPOP all operate on individual elements. The lowering should be based on the element width. I noticed this by inspection. No tests in tree are currently affected, but I thought it would be good to fix so someone doesn't have to debug it in the future.

…#168586) The compiler doesn't emit a diagnostics when the signature of a function defined in a namespace gets out-of-sync with its declaration. Let's use qualified names for function definitions instead of nesting them in a namespace so that mismatches are diagnosed by the compiler rather than by the (less understandable) linker.

…lvm#168187) A function prologue can begin with a pre-index STR instruction for a floating-point register. To construct an unwind plan from assembly correctly, the instruction emulator must support such instructions.

Follow up on a cse OpType-mismatch crash reported due to ef023ca (Reland [VPlan] Expand WidenInt inductions with nuw/nsw), setting the OpType correctly when returning from getFlagsFromIndDesc.

…lvm#167703) In this PR I'm changing the way we provide the missing functions like strnlen() on z/OS from the separate header file to a wrapper around the system headers that declare these functions. This will be less intrusive. --------- Co-authored-by: Zibi Sarbinowski <zibi@ca.ibm.com>

1. Handle transformed awaitables for `AllowedCallees`, which generate temporaries and weren't being handled by llvm#167778. 1. Fix name mismatches in `storeOptions`.

This change drops the use of the "Layout" type and instead uses explicit padding throughout the compiler to represent types in HLSL buffers. There are a few parts to this, though it's difficult to split them up as they're very interdependent: 1. Refactor HLSLBufferLayoutBuilder to allow us to calculate the padding of arbitrary types. 2. Teach Clang CodeGen to use HLSL specific paths for cbuffers when generating aggregate copies, array accesses, and structure accesses. 3. Simplify DXILCBufferAccesses such that it directly replaces accesses with dx.resource.getpointer rather than recalculating the layout. 4. Basic infrastructure for SPIR-V handling, but the implementation itself will need work in follow ups. Fixes several issues, including llvm#138996, llvm#144573, and llvm#156084. Resolves llvm#147352.

The pstl top-level directory was removed, but we forgot to remove pstl from the list of valid subdirectories.

FCmp instructions have both a predicate and fast-math flags. Introduce a new FCmp kind, that combines both to model this correctly in the current system. This should be NFC modulo VPlan printing which now includes the correct fast-math flags.

…from its lower 32-bit (llvm#168458) On some targets, a packed f32 instruction can only read 32 bits from a scalar operand (SGPR or literal) and replicates the bits to both channels. In this case, we should not fold an immediate value if it can't be replicated from its lower 32-bit. Fixes SWDEV-567139.

…lvm#162362) This patch provides definitions for `pkey_*` functions for linux x86_64. `pkey_alloc`, `pkey_free`, and `pkey_mprotect` are simple syscall wrappers. `pkey_set` and `pkey_get` modify architecture-specific registers. The logic for these live in architecture specific directories: * `libc/src/sys/mman/linux/x86_64/pkey_common.h` has a real implementation * `libc/src/sys/mman/linux/generic/pkey_common.h` contains stubs that just return `ENOSYS`.

This allows SDNodes to be validated against their expected type profiles and reduces the number of changes required to add a new node. The verification functionality detected a few issues, two of them were fixed (missing `SDNPMemOperand` property on `TCGEN05_MMA` nodes and extra glue operand/result on `CallPrototype`), the one remaining is with `ProxyReg` node, see `NVPTXSelectionDAGInfo::verifyTargetNode()`. Part of llvm#119709. Pull Request: llvm#168367

…lvm#168514) Introduces the Task and TaskDispatcher interfaces (TaskDispatcher.h), ThreadPoolTaskDispatcher implementation (ThreadPoolTaskDispatch.h), and updates Session to include a TaskDispatcher instance that can be used to run tasks. TaskDispatcher's introduction is motivated by the need to handle calls to JIT'd code initiated from the controller process: Incoming calls will be wrapped in Tasks and dispatched. Session shutdown will wait on TaskDispatcher shutdown, ensuring that all Tasks are run or destroyed prior to the Session being destroyed.

Add support for marking global variables with common linkage.

…m#168072) In this case, the value is a constant, not an addend to a relocation. So the "Relocation Not In Range" error must not be triggered. Regression from PR llvm#112877 Fixes llvm#132322

z1-cciauto · 2025-11-19T00:48:24Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2860

boomanaiden154 and others added 30 commits November 18, 2025 08:22

[flang][NFC] Strip trailing whitespace from tests (6 of N)

38c1a58

Only the fortran source files in flang/test/Lower/PowerPC and some in flang/test/Lower have been modified. The other files in the directory will be cleaned up in subsequent commits

[AArch64][GISel] Don't crash in known-bits when copying from vectors …

93a8ca8

…to non-vectors (llvm#168081) Updates the demanded elements before recursing through copies in case the type of the source register changes from a non-vector register to a vector register. Fixes llvm#167842.

[lldb] update lldb-server platform help parsing (attempt 3) (llvm#164904

2675dcd

) * original change llvm#162730 * with windows fix llvm#164843 * remove timeout that was pointed out in the comment above * Remove test that starts and listens on a socket to avoid timeout issues

[APInt] Introduce carry-less multiply primitives (llvm#168527)

727ee7e

In line with a std proposal to introduce std::clmul, and in preparation to introduce a clmul intrinsic, implement carry-less multiply primitives for APIntOps, clmul[rh]. Ref: https://isocpp.org/files/papers/P3642R3.html

[AMDGPU][GlobalISel] Add RegBankLegalize support for G_IS_FPCLASS (ll…

cb58129

…vm#167575)

[AsmParser] Use a range-based for loop (NFC) (llvm#168488)

6d3971d

Identified with modernize-loop-convert.

[ARM] Pattern match Low Overhead Loops pseudos (NFC) (llvm#168209)

3cf1f0c

Pull Request: llvm#168209

[flang][OpenMP] Move two utilities from Semantics to Parser, NFC (llv…

c88ae6e

…m#168549) Move `GetInnermostExecPart` and `IsStrictlyStructuredBlock` from Semantics/openmp-utils.* to Parser/openmp-utils.*. These two only depend on the AST contents and properties.

Revert "[Github] Update PR labeller to v6.0.1 (llvm#167246)"

d772663

This reverts commit bd8c941. This still broke things and evidently needs more testing on a fork before relanding. https://github.com/llvm/llvm-project/actions/runs/19475911086

Revert "[MLIR][NVVM] Add tcgen05.mma MLIR Ops" (llvm#168583)

5407e62

Reverts llvm#164356 The bots are broken.

[CI] Skip Running Premerge Advisor on AArch64 (llvm#168404)

8bdd82c

They were still running because the conditional was not correct. This patch fixes that so they do not interefere with the results of the job.

[CI] Prefer Bash Tests over Empty String Comparisons (llvm#168575)

40ed57c

These are more idiomatic in bash.

[GISel][RISCV] Compute CTPOP of small odd-sized integer correctly (ll…

523bd2d

…vm#168559) Fixes the assertion in llvm#168523 This patch lifts the small, odd-sized integer to 8 bits, ensuring that the following lowering code behaves correctly.

[clang][DependencyScanning] Add Test Coverage of StabeDirs during B…

3f61402

…y-Name Lookups (llvm#168143) This PR adds some test coverage for `StableDirs` during by-name lookups.

[NFC][TableGen] Remove close member from various CodeGenHelpers (ll…

8f67759

…vm#167904) Always rely on local scopes to enforce the lifetime of these helper objects and by extension where the "closing" of various C++ code constructs happens.

[ConstantFolding] Generalize constant folding for vector_interleave2 …

4ab2423

…to interleave3-8. (llvm#168473)

laxmansole and others added 24 commits November 18, 2025 14:33

[DebugInfo][IR] Verifier checks for the extraData (llvm#167971)

58b8e6e

LLVM IR verifier checks for `extraData` in debug info metadata. This is a follow-up PR based on discussions in llvm#165023

[NFC][AMDGPU] IGLP: Fixes for unsigned int handling (llvm#135090)

576e1af

Fixes unsigned int underflows in `MFMASmallGemmSingleWaveOpt::applyIGLPStrategy`.

[AllocToken] Test compatibility with -fsanitize=kcfi,memtag (llvm#168600

8aca6c3

) Test that -fsanitize=alloc-token is compatible with kcfi and memtag, as these should also be possible to combine. NFC.

[bazel] fix llvm#168212 (llvm#168598)

e1bb50b

[CIR] Mark globals as constants (llvm#168463)

56b1d42

We previously added support for marking GlobalOp operations as constant, but the handling to actually do so was left mostly unimplemented. This fills in the missing pieces.

[VPlan] Fix OpType-mismatch in getFlagsFromIndDesc (llvm#168560)

507f236

Follow up on a cse OpType-mismatch crash reported due to ef023ca (Reland [VPlan] Expand WidenInt inductions with nuw/nsw), setting the OpType correctly when returning from getFlagsFromIndDesc.

[clang-tidy] Fix bugs in misc-coroutine-hostile-raii check (llvm#167947)

31ec633

1. Handle transformed awaitables for `AllowedCallees`, which generate temporaries and weren't being handled by llvm#167778. 1. Fix name mismatches in `storeOptions`.

[runtimes] Remove pstl from the list of supported runtimes (llvm#168414)

5cde345

The pstl top-level directory was removed, but we forgot to remove pstl from the list of valid subdirectories.

[CIR] Add support for common linkage (llvm#168613)

3e499e9

Add support for marking global variables with common linkage.

[llvm][ARM] Allow MOVT and MOVW on the offset between two labels (llv…

5e80358

…m#168072) In this case, the value is a constant, not an addend to a relocation. So the "Relocation Not In Range" error must not be triggered. Regression from PR llvm#112877 Fixes llvm#132322

merge main into amd-staging

ba0964a

ronlieb requested review from a team and dpalermo November 19, 2025 00:46

dpalermo approved these changes Nov 19, 2025

View reviewed changes

z1-cciauto merged commit 9cc0238 into amd-staging Nov 19, 2025
16 checks passed

z1-cciauto deleted the amd/merge/upstream_merge_20251118183849 branch November 19, 2025 03:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #621

merge main into amd-staging #621

Uh oh!

ronlieb commented Nov 19, 2025

Uh oh!

z1-cciauto commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

47 participants

merge main into amd-staging #621

merge main into amd-staging #621

Uh oh!

Conversation

ronlieb commented Nov 19, 2025

Uh oh!

z1-cciauto commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

47 participants