merge main into amd-staging #581

z1-cciauto · 2025-11-13T12:07:20Z

No description provided.

…67776) Supporting this in GISel requires multiple changes to IRTranslator to support aggregate returns containing scalable vectors and non-scalable types. Falling back is the quickest way to fix the crash. Fixes llvm#167618

…MergeInputChains. NFC (llvm#167807)

…ing pass: `arith-to-apfloat`" (llvm#167834) Reverts llvm#167608 Broken builder https://lab.llvm.org/buildbot/#/builders/52/builds/12781

This was preventing check-compiler-rt from actually running when we touched a project that was supposed to cause compiler-rt to be tested.

Fix documentation in `abseil`, `android`, `altera`, `boost` and `bugprone`. This is part of the codebase cleanup described in [llvm#167098](llvm#167098)

…tk.S" This reverts commit 1f9eff1. This is done in preparation of reverting parts of 885d7b7.

…nctions" This reverts parts of commit 885d7b7, and adds verbose comments explaining all the variants of this function, for clarity for future readers. It turns out that those functions actually weren't misnamed or unused after all: Apparently Clang doesn't match GCC when it comes to what stack probe function is referenced on i386 mingw. GCC < 4.6 references a symbol named "___chkstk", with three leading underscores, and GCC >= 4.6 references "___chkstk_ms". Restore these functions, to allow linking object files built with GCC with compiler-rt.

…vm#165467) The C locale is defined by the C standard, so we know exactly which digits classify as (x)digits. Instead of going through the locale base API we can simply implement functions which determine whether a character is one ourselves, and probably improve codegen significantly as well that way.

…converting constructor (llvm#165619) This also backports LWG2415 as a drive-by.

) On Windows 8 and above, the WaitOnAddress, WakeByAddressSingle and WakeByAddressAll functions allow efficient implementation of the C++20 wait and notify features of std::atomic_flag. These Windows functions have never been made use of in libc++, leading to very poor performance of these features on Windows platforms, as they are implemented using a spin loop with backoff, rather than using any OS thread signalling whatsoever. This change implements the use of these OS functions where available, falling back to the original implementation on Windows versions prior to 8. Relevant API docs from Microsoft: https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-wakebyaddresssingle https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-wakebyaddressall Fixes llvm#127221

…162800) This should improve the time it takes to run the test suite a bit. Right now there are only a handful of headers in the modulemap because we're missing a lot of includes in the tests. New headers should be added there from the start, and we should fill up the modulemap over time until it contains all the test support headers.

These headers are incredibly simple and closely related, so this merges them into a single one.

…vm#167674) This is an NFC for now, as the SME checks for macOS platforms are not implemented, so zaDisable() is a no-op, but both paths for resuming from an exception should disable ZA. This is a fixup for a recent change in llvm#165066.

…vm#167839) We need to get the element type size at bytecode generation time to check. We also need to diagnose this in the LHS == RHS case.

…lvm#167712) This fixes an assert when compiling llvm-test-suite with -march=rva23u64 -O3 that started appearing sometime this week. We get "Cannot overlap two segments with differing ValID's" because we try to coalescse these two vsetvlis: %x:gprnox0 = COPY $x8 dead $x0 = PseudoVSETIVLI 1, 208, implicit-def $vl, implicit-def $vtype %y:gprnox0 = COPY %x %v:vr = COPY $v8, implicit $vtype %x = PseudoVSETVLI %x, 208, implicit-def $vl, implicit-def $vtype --> %x:gprnox0 = COPY $x8 %x = PseudoVSETVLI %x, 208, implicit-def $vl, implicit-def $vtype %y:gprnox0 = COPY %x %v:vr = COPY $v8, implicit $vtype However to do so would cause us to extend the segment of the new value of %x up past the first segment, which overlaps. This fixes it by checking that its safe to extend the segment, by simply making sure the interval isn't live at the first vsetvli. This unfortunately causes a regression in the existing coalesce_vl_avl_same_reg test because even though we could coalesce the vsetvlis there, we now bail. I couldn't think of an easy way to handle this safely, but I don't think this is an important case to handle: After testing this patch on SPEC CPU 2017 there are no codegen changes.

…ize (llvm#165924) yaml2obj would crash when processing Mach-O load commands with cmdsize smaller than the actual structure size e.g. LC_SEGMENT_64 with cmdsize=56 instead of 72. The crash occurred due to integer underflow when calculating padding: cmdsize - BytesWritten wraps to a large value when negative, causing a massive allocation attempt.

…165598) MSVC supports an extension allowing to delete an array of objects via pointer whose static type doesn't match its dynamic type. This is done via generation of special destructors - vector deleting destructors. MSVC's virtual tables always contain a pointer to the vector deleting destructor for classes with virtual destructors, so not having this extension implemented causes clang to generate code that is not compatible with the code generated by MSVC, because clang always puts a pointer to a scalar deleting destructor to the vtable. As a bonus the deletion of an array of polymorphic object will work just like it does with MSVC - no memory leaks and correct destructors are called. This patch will cause clang to emit code that is compatible with code produced by MSVC but not compatible with code produced with clang of older versions, so the new behavior can be disabled via passing -fclang-abi-compat=21 (or lower). This is yet another attempt to land vector deleting destructors support originally implemented by llvm#133451. This PR contains fixes for issues reported in the original PR as well as fixes for issues related to operator delete[] search reported in several issues like llvm#133950 (comment) llvm#134265 Fixes llvm#19772

…t` (llvm#167848) Reland pass and fix linker errors. --------- Co-authored-by: Maksim Levental <maksim.levental@gmail.com>

Fix build after llvm#167848.

This patch adds a TMA intrinsic for Global to shared::cta copy, which was introduced with ptx86. Also remove the NoCapture<> annotation from the pointer arguments to these intrinsics, since the copy operations are asynchronous in nature. lit tests are verified with a ptxas from cuda-12.8. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>

Use it in `printVRegOrUnit()`, `getPressureSets()`/`PSetIterator`, and in functions/classes dealing with register pressure. Static type checking revealed several bugs, mainly in MachinePipeliner. I'm not very familiar with this pass, so I left a bunch of FIXMEs. There is one bug in `findUseBetween()` in RegisterPressure.cpp, also annotated with a FIXME.

This patch fixes the latency of the SVE FADDP instruction for the Neoverse-N3 SWOG. The latency of flaoting point arith, min/max pairwise SVE FADDP should be 3, as per the N3 SWOG.

… a usubo.0)" (llvm#167854) Reverts llvm#161651 due to downstream bad codegen reports

…167856) We only seem to use the SVE fdot for fixed-length vector types when they are larger than 128bits, whereas we can also use them for 128bits vectors if SVE2p1/SME2 is available.

As in title. See here for more context: https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract/80909 Also add a warning in llc when global contract flag is encountered on x86. Remove global contract from last x86 test

…ementwise_fma and support constexpr (llvm#154731) Now that llvm#152455 is done, we can make all the scalar fma intrinsics to wrap __builtin_elementwise_fma, which also allows constexpr The main difference is that FMA4 intrinsics guarantee that the upper elements are zero, while FMA3 passes through the destination register elements like older scalar instructions Fixes llvm#154555

…ars are mergeable (llvm#167667) See if each pair of scalar operands of a build vector can be freely merged together - typically if they've been split for some reason by legalization. If we can create a new build vector node with double the scalar size, but half the element count - reducing codegen complexity and potentially allowing further optimization. I did look at performing this generically in DAGCombine, but we don't have as much control over when a legal build vector can be folded - another generic fold would be to handle this on insert_vector_elt pairs, but again legality checks could be limiting. Fixes llvm#167498

…lvm#167693)

RISC-V test coverage for llvm#167857

…lvm#167826) This patch replaces generic `LLVM_Type` with specific `I32` type in NVVM operations. `NVVM_SyncWarpOp`: Change mask parameter from `LLVM_Type` to `I32`. `NVVM_CpAsyncOp`: Change cpSize parameter from `Optional<LLVM_Type>` to `Optional<I32>`. Signed-off-by: Dharuni R Acharya <dharunira@nvidia.com>

z1-cciauto · 2025-11-13T12:08:18Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2794

owenca and others added 30 commits November 12, 2025 20:55

[clang-format] Don't swap (const override) with QAS_Right (llvm#167191

dfe9838

) Fixes llvm#154846

[DWARFCFIChecker] Use MCRegister instead of MCPhysReg. NFC (llvm#167823)

13251f5

[SelectionDAGISel] Const correct ChainNodesMatched argument to Handle…

99a726e

…MergeInputChains. NFC (llvm#167807)

Revert "Reland yet again: [mlir] Add FP software implementation lower…

140e07c

…ing pass: `arith-to-apfloat`" (llvm#167834) Reverts llvm#167608 Broken builder https://lab.llvm.org/buildbot/#/builders/52/builds/12781

[CI] Fix misspelled runtimes_targets variable (llvm#167696)

147e615

This was preventing check-compiler-rt from actually running when we touched a project that was supposed to cause compiler-rt to be tested.

[clang-tidy][docs][NFC] Enforce 80 characters limit (1/N) (llvm#167492)

5edf70c

Fix documentation in `abseil`, `android`, `altera`, `boost` and `bugprone`. This is part of the codebase cleanup described in [llvm#167098](llvm#167098)

Revert "[compiler-rt] Rename the now lone i386/chkstk2.S to i386/chks…

d2f0b27

…tk.S" This reverts commit 1f9eff1. This is done in preparation of reverting parts of 885d7b7.

[libc++] Simplify the implementation of the unique_ptr -> shared_ptr …

2ac9e59

…converting constructor (llvm#165619) This also backports LWG2415 as a drive-by.

[libc++] Merge is_{,un}bounded_array.h into is_array.h (llvm#167479)

f038dfd

These headers are incredibly simple and closely related, so this merges them into a single one.

[clang][bytecode] Fix diagnosing subtration of zero-size pointers (ll…

70eb4b0

…vm#167839) We need to get the element type size at bytecode generation time to check. We also need to diagnose this in the LHS == RHS case.

[mlir] Add FP software implementation lowering pass: `arith-to-apfloa…

7a53d33

…t` (llvm#167848) Reland pass and fix linker errors. --------- Co-authored-by: Maksim Levental <maksim.levental@gmail.com>

[mlir] Fix build after llvm#167848 (llvm#167855)

7e5155a

Fix build after llvm#167848.

[AArch64] Fix SVE FADDP latency on Neoverse-N3 (llvm#167676)

4340159

This patch fixes the latency of the SVE FADDP instruction for the Neoverse-N3 SWOG. The latency of flaoting point arith, min/max pairwise SVE FADDP should be 3, as per the N3 SWOG.

Revert "[DAG] Fold (umin (sub a b) a) -> (usubo a b); (select usubo.1…

a5342d5

… a usubo.0)" (llvm#167854) Reverts llvm#161651 due to downstream bad codegen reports

[AArch64] Use SVE fdot for partial.reduce.fadd for NEON types. (llvm#…

5fa3ccb

…167856) We only seem to use the SVE fdot for fixed-length vector types when they are larger than 128bits, whereas we can also use them for 128bits vectors if SVE2p1/SME2 is available.

[LLVM][InstCombine] not (bitcast (cmp A, B) --> bitcast (!cmp A, B) (l…

f84ad45

…lvm#167693)

lukel97 and others added 3 commits November 13, 2025 19:53

[RISCV] Add test for partial reduce with select. NFC

2a53949

RISC-V test coverage for llvm#167857

merge main into amd-staging

57d09d3

z1-cciauto requested a review from a team November 13, 2025 12:07

ronlieb approved these changes Nov 13, 2025

View reviewed changes

z1-cciauto merged commit da9a9ea into amd-staging Nov 13, 2025
14 checks passed

z1-cciauto deleted the upstream_merge_202511130707 branch November 13, 2025 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #581

merge main into amd-staging #581

Uh oh!

z1-cciauto commented Nov 13, 2025

Uh oh!

z1-cciauto commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

25 participants

merge main into amd-staging #581

merge main into amd-staging #581

Uh oh!

Conversation

z1-cciauto commented Nov 13, 2025

Uh oh!

z1-cciauto commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

25 participants