merge main into amd-staging #588

ronlieb · 2025-11-14T11:16:32Z

No description provided.

Tracing requires liboffload to be initialized, so calling isTracingEnabled() before olInit always returns false. This caused the first trace log to look like: ``` -> OL_SUCCESS ``` instead of: ``` ---> olInit() -> OL_SUCCESS ``` This patch moves the pre-call trace print for olInit so it is emitted only after initialization. It would be possible to add extra logic to detect whether liboffload is already initialized and only postpone the first pre-call print, but this would add unnecessary complexity, especially since this is tablegen code. The difference would matter only in the unlikely case of a crash during a second olInit call. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>

Only the fortran source files in flang/test/Intrinsics have been modified. The other files in flang/test will be cleaned up in subsequent commits

- Adopt ifdef and namespace emitters in SubtargeEmitter. - To aid that, factor out emission of different sections of the code into individual helper functions.

…lvm#167896) Reverts llvm#163860

Prepare a 'this' for CXXDefaultInitExprs

…e reduction plans (llvm#165913) The TypeSwitch for extracting the Opcode now handles the `VPReductionRecipe` case. Fixes llvm#165359.

This commit adds optimized assembly versions of single-precision float multiplication and division. Both functions are implemented in a style that can be assembled as either of Arm and Thumb2; for multiplication, a separate implementation is provided for Thumb1. Also, extensive new tests are added for multiplication and division. These implementations can be removed from the build by defining the cmake variable COMPILER_RT_ARM_OPTIMIZED_FP=OFF. Outlying parts of the functionality which are not on the fast path, such as NaN handling and underflow, are handled in helper functions written in C. These can be shared between the Arm/Thumb2 and Thumb1 implementations, and also reused by other optimized assembly functions we hope to add in future.

…ve (llvm#167875) This prevents the backend from crashing for basic uses of __SVCount_t type (e.g., as function arguments), without +sve2p1 or +sme2. Fixes llvm#167462

Without this patch, SmallDenseMap::grow has two separate code paths to grow the bucket array. The code path to handle the small mode has its own traversal over the bucket array. This patch simplifies this logic as follows: 1. Allocate a temporary instance of SmallDenseMap. 2. Move valid key/value pairs to the temporary instance. 3. Move LargeRep to *this. Remarks: - This patch adds moveFromImpl to move key/value pairs. moveFromOldBuckets is updated to use the new helper function. - This patch adds a private constructor to SmallDenseMap that takes an exact number of buckets, accompanied by tag ExactBucketCount. - This patch adds a fast path to deallocateBuckets in case getLargeRep()->NumBuckets == 0, just like destroyAll. This path is used to destruct zombie instances after moves. - In somewhat rare cases, we "grow" from the small mode to the small mode when there are many tombstones in the inline storage. This is handled with another call to moveFrom.

…okupOrTrackRegister (llvm#167841) The LocID for registers is just the register ID. The getLocID function is supposed to hide this detail, but it wasn't being used consistently. This avoids a bunch of implicit casts from Register or MCRegister to unsigned.

… (NFC) (llvm#155262) CMN also has a function like this, we should do the same with CMP.

This commit adds a new `ValueMatcher` class that can be used in gtest matching contexts to match against `lldb_private::Value` objects. We always match against the values `value_type` and `context_type`. For HostAddress values we will also match against the expected host buffer contents. For Scalar, FileAddress, and LoadAddress values we match against an expected Scalar value. The matcher is used to improve the quality of the tests in the `DwarfExpressionTest.cpp` file. Previously, the local `Evaluate` function would return an `Expected<Scalar>` value which makes it hard to verify that we actually get a Value of the expected type without adding custom evaluation code. Now we return an `Expected<Value>` so that we can match against the full value contents. The resulting change improves the quality of the existing checks and in some cases eliminates the need for special code to explicitly check value types. I followed the gtest [guide](https://google.github.io/googletest/gmock_cook_book.html#writing-new-monomorphic-matchers) for writing a new value matcher.

The optimized version of xsgetn for basic_filebuf added in llvm#165223 has an issue where if the reads come from both the buffer and the filesystem it returns the wrong number of characters. This patch should address the issue.

…r zvfbfa (llvm#167819)

This proposal adds a `cl::opt` CLI flag `-bpf-allow-misaligned-mem-access` to BPF target that lets users enable allowing misaligned memory accesses. The motivation behind the proposal is user space eBPF VMs (interpreters or JITs running in user space) typically run on real CPUs where unaligned memory accesses are acceptable (or handled efficiently) and can be enabled to simplify lowering and improve performance. In contrast, kernel eBPF must obey verifier constraints and platform-specific alignment restrictions. A new CLI option keeps kernel behavior unchanged while giving userspace VMs an explicit opt-in to enable more permissive codegen. It supports both use-cases without diverging codebases.

…7763) As mentioned in comments for llvm#164913, the `if()` statements here can't be externally triggered, since these writeback registers are passed in from the caller. So they should really be `assert()`s so it's obvious we don't need testcases for them, and more optimal.

Reverts llvm#161546 One of the buildbots reported a cmake error I don't understand, and which I didn't get in my own test builds: ``` CMake Error at /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/compiler-rt/cmake/Modules/CheckAssemblerFlag.cmake:23 (try_compile): COMPILE_DEFINITIONS specified on a srcdir type TRY_COMPILE ``` My best guess is that the thing I did in `CheckAssemblerFlag.cmake` only works on some versions of cmake. But I don't understand the problem well enough to fix it quickly, so I'm reverting the whole patch and will reland it later.

Implement support for GNUNullExpr

…m#167540) In adopting `[[clang::nonblocking]]` there's been some user confusion. Changes to address `-Wfunction-effects` warnings are often pure annotation, with no runtime effect. Changes to avoid `-Wperf-constraint-implies-noexcept` warnings are risky: adding `noexcept` creates a new potential for the program to crash. In retrospect, `-Wperf-constraint-implies-noexcept` shouldn't have been made part of `-Wall`. --------- Co-authored-by: Doug Wyatt <dwyatt@apple.com>

This changes muls by `3 << C` from `(X << C + 2) - (X << C)` to `(X << C + 1) + (X << C)`. If Zba is available, the output is not affected as we emit `(shl (sh1add X, X), C)` instead. There are two advantages: - ADD is more compressible - Often a reduced instruction count, by a heuristic that `(X << C + 1)` is more likely to have another use than `(X << C + 2)`

…7898) So that changing the type of the container (planned in a future patch) is less intrusive.

Upstream the basic support for the ExtVectorType element expr

…actor in the way of a glue. (llvm#167805) In the new test, we're trying to fold a load and a X86ISD::CALL. The call has a CopyToReg glued to it. The load and the call have different input chains so they need to be merged. This results in a TokenFactor that gets put between the CopyToReg and the final CALLm instruction. The DAG scheduler can't handle that. The load here was created by legalization of the extract_element using a stack temporary store and load. A normal IR load would be chained into call sequence by SelectionDAGBuilder. This would usually have the load chained in before the CopyToReg. The store/load created by legalization don't get chained into the rest of the DAG. Fixes llvm#63790

…llvm#167901)

AMDGPU: Start to use AV classes for unknown vector class Use AGPR+VGPR superclasses for gfx90a+. The type used for the class should be the broadest possible class, to be contextually restricted later. InstrEmitter clamps these to the common subclass of the context use instructions, so we're best off using the broadest possible class for all types. Note this does very little because we only use VGPR classes for FP types (though this doesn't particularly make any sense), and we legalize normal loads and stores to integer.

…agation.cpp (NFC)

XeGPU and XeVM dialect has assigned maintainers, but related folders currently lack code owners. Add charithaintc and Jianhui-Li as code owner for XeGPU related folders. Add silee2 as code owner for XeVM related folders. Note: charithaintc is current maintainer of XeGPU dialect. silee2 is current maintainer of XeVM dialect.

This patch updates various LLVM headers to properly add the `LLVM_ABI` and `LLVM_ABI_FOR_TEST` annotations to build LLVM as a DLL on Windows. This effort is tracked in llvm#109483.

…it (llvm#167760) As new operations are added (for example uinc_wrap, udec_wrap, usub_cond, usub_sat), they will not automatically be supported by outline atomics and so should be expanded by the pre-isel pass. Make the list of supported outline atomics explicit to make sure we only mark the expected intrinsics as outline atomics. Fixes llvm#167728

…ombine (llvm#167908)

This patch is a follow up on llvm#167532, which refactored these method's code into the relevant `printOperation()` functions but did not remove them.

… C/C++ (llvm#167735) The variable-category 'allocatable' is explicitly noted as applying only to Fortran. If specified in C/C++ it should generate an error. NOTE: Issue will be filed against OpenMP 6.0 specification that restriction is missing from 'default' clause section. From the OpenMP 6.0 specification: Section 7.5.1 default Clause Semantics, under Fortran only, L18-19, pg. 223 The allocatable variable-category specifies variables with the ALLOCATABLE attribute. Section 7.9.9 defaultmap Clause Semantics, under Fortran only, L9-10, pg. 292 The allocatable variable-category specifies variables with the ALLOCATABLE attribute. Restrictions, C/C++ L1, pg. 293 The specified variable-category must not be allocatable.

@lamb-j

…ranch (llvm#166625) GitHub's Update Branch button is a helpful tool for quickly updating a PR before merging, but it might also be important to point out that it creates a merge commit without additional prompting, which may or may not be desired behavior for a given LLVM contributor. Opened on the suggestion of @lamb-j

…lvm#167889) The casts are currently no-op because `MCRegUnit` is a typedef'ed to `unsigned`, but this will change soon enough and explicit cast will be required.

…lvm#167020)

…#164357) Refactor the XCnt optimization checks so that they can be checked when applying a pre-existing waitcnt. This removes unnecessary xcnt waits when taking a loop backedge.

In preparation for porting to the NewPM. Reviewers: kazutakahirata, arsenm Reviewed By: kazutakahirata, arsenm Pull Request: llvm#167910

) AMDGPU: Really use AV classes by default for vector classes Update getRegClassFor to use AV classes in place of VGPRs for gfx90a-gfx950. There are a handful of regressions. Most are enabling unprofitable rematerialization which reduce register count by 1 but add an unnecessary instruction.

This reverts commit 1a86f0a.

z1-cciauto · 2025-11-14T11:18:30Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2807

lplewa and others added 30 commits November 13, 2025 15:56

[flang][NFC] Strip trailing whitespace from tests (4 of N)

a12600c

Only the fortran source files in flang/test/Intrinsics have been modified. The other files in flang/test will be cleaned up in subsequent commits

[NFC][TableGen] Adopt CodeGenHelpers in SubtargetEmitter (llvm#163820)

e5c418f

- Adopt ifdef and namespace emitters in SubtargeEmitter. - To aid that, factor out emission of different sections of the code into individual helper functions.

Revert "[Flang][OpenMP] Update declare mapper lookup via use-module" (l…

e1324a9

…lvm#167896) Reverts llvm#163860

[CIR] Prepare a 'this' for CXXDefaultInitExprs (llvm#165994)

6a0ba8b

Prepare a 'this' for CXXDefaultInitExprs

[LV] Update LoopVectorizationPlanner::emitInvalidCostRemarks to handl…

a04c6b5

…e reduction plans (llvm#165913) The TypeSwitch for extracting the Opcode now handles the `VPReductionRecipe` case. Fixes llvm#165359.

[AArch64][SVE] Allow basic use of target("aarch64.svcount") with +s…

12322b2

…ve (llvm#167875) This prevents the backend from crashing for basic uses of __SVCount_t type (e.g., as function arguments), without +sve2p1 or +sme2. Fixes llvm#167462

[GISel][AArch64] Create emitCMP instead of cloning a virtual register…

d6703bb

… (NFC) (llvm#155262) CMN also has a function like this, we should do the same with CMP.

[libcxx] Fix xsgetn in basic_filebuf (llvm#167779)

ea16f7d

The optimized version of xsgetn for basic_filebuf added in llvm#165223 has an issue where if the reads come from both the buffer and the filesystem it returns the wrong number of characters. This patch should address the issue.

[RISCV][llvm] Handle INSERT_VECTOR_ELT, EXTRACT_VECTOR_ELT codegen fo…

e63a47d

…r zvfbfa (llvm#167819)

[CIR] Implement support for GNUNullExpr (llvm#167715)

de3d74a

Implement support for GNUNullExpr

[libc][NFC] Fix warnings in RPC server code

6b49e6a

[CodeGen] Hide SparseSet<LiveRegUnit> behind a typedef (NFC) (llvm#16…

98f9b54

…7898) So that changing the type of the container (planned in a future patch) is less intrusive.

[CIR] Upstream basic support for ExtVector element expr (llvm#167570)

9216e17

Upstream the basic support for the ExtVectorType element expr

[CodeGen] Add TRI::regunits() iterating over all register units (NFC) (…

d1cc137

…llvm#167901)

[bazel] Added ArithToAPFloat library to bazel (llvm#167916)

b49a847

[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in ShardingProp…

965b338

…agation.cpp (NFC)

Add missing LLVM_ABI annotations (llvm#167718)

23f6a8a

This patch updates various LLVM headers to properly add the `LLVM_ABI` and `LLVM_ABI_FOR_TEST` annotations to build LLVM as a DLL on Windows. This effort is tracked in llvm#109483.

davemgreen and others added 16 commits November 13, 2025 18:05

AMDGPU: Add baseline test for load-select to load select of pointer c…

ac27b24

…ombine (llvm#167908)

[gn] port 825706b

b9301c2

[mlir][emitc] Remove dead methods from emitter (llvm#167657)

8ae3ac8

This patch is a follow up on llvm#167532, which refactored these method's code into the relevant `printOperation()` functions but did not remove them.

[AMDGPU] Use MCRegUnit, insert explicit casts to/from unsigned (NFC) (l…

86d712c

…lvm#167889) The casts are currently no-op because `MCRegUnit` is a typedef'ed to `unsigned`, but this will change soon enough and explicit cast will be required.

[clang-tidy][NFC] Enable "HeaderFilterRegex" in clang-tidy codebase (l…

751a943

…lvm#167020)

[AMDGPU][SIInsertWaitCnts] Gfx12.5 - Refactor xcnt optimization (llvm…

5e4505d

…#164357) Refactor the XCnt optimization checks so that they can be checked when applying a pre-existing waitcnt. This removes unnecessary xcnt waits when taking a loop backedge.

[NFC][X86] Format Floating Point Stackifier Pass

7aa60b6

In preparation for porting to the NewPM. Reviewers: kazutakahirata, arsenm Reviewed By: kazutakahirata, arsenm Pull Request: llvm#167910

DAG: Allow select ptr combine for non-0 address spaces (llvm#167909)

e5f499f

[Offload] Add device info for shared memory (llvm#167817)

1a86f0a

merge main into amd-staging

eb8752f

merge main into amd-staging

da1534d

Revert "[Offload] Add device info for shared memory (llvm#167817)"

143e2c6

This reverts commit 1a86f0a.

ronlieb requested review from a team and dpalermo November 14, 2025 11:16

dpalermo approved these changes Nov 14, 2025

View reviewed changes

ronlieb closed this Nov 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #588

merge main into amd-staging #588

Uh oh!

ronlieb commented Nov 14, 2025

Uh oh!

z1-cciauto commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

38 participants

merge main into amd-staging #588

merge main into amd-staging #588

Uh oh!

Conversation

ronlieb commented Nov 14, 2025

Uh oh!

z1-cciauto commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

38 participants