merge main into amd-staging #624

ronlieb · 2025-11-19T13:30:16Z

No description provided.

This patch is just a small cleanup that unifies the various spots that add a DWARF expression to the output.

) Introduce two new OpenACC operation interfaces for identifying global variables and their address computations: - `GlobalVariableOpInterface`: Identifies operations that define global variables. Provides an `isConstant()` method to query whether the global is constant. - `AddressOfGlobalOpInterface`: Identifies operations that compute the address of a global variable. Provides a `getSymbol()` method to retrieve the symbol reference. This is being done in preparation for `ACCImplicitDeclare` pass which will automatically ensure that `acc declare` is applied to globals when needed. The following operations now implement these interfaces: - `memref::GlobalOp` implements `GlobalVariableOpInterface` - `memref::GetGlobalOp` implements `AddressOfGlobalOpInterface` - `fir::GlobalOp` implements `GlobalVariableOpInterface` - `fir::AddrOfOp` implements `AddressOfGlobalOpInterface`

ninja is already installed by default on Linux and macOS.

Adopt `IfDefEmitter` and `NamespaceEmitter` in CodeGenMapTable.cpp

We don't have enough information to infer the probability of a weak function pointer being nullptr or not (open question if we could propagate this from the linker) Issue llvm#147390

Add the cl::values to the pass options so an assert is not reached when trying to generate a reproducer e.g. "unknown data value for option"

This commit adds the below fence intrinsics: - llvm.nvvm.fence.acquire.sync_restrict.space.cluster.scope.cluster - llvm.nvvm.fence.release.sync_restrict.space.cta.scope.cluster - llvm.nvvm.fence.mbarrier_init.release.cluster - llvm.nvvm.fence.proxy.async.generic.acquire.sync_restrict.space.cluster.scope.cluster - llvm.nvvm.fence.proxy.async.generic.release.sync_restrict.space.cta.scope.cluster llvm.nvvm.fence.proxy.alias - llvm.nvvm.fence.proxy.async - llvm.nvvm.fence.proxy.async.global - llvm.nvvm.fence.proxy.async.shared_cluster - llvm.nvvm.fence.proxy.async.shared_cta For more information, please refere the [PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar)

Allow a target to steal the definition of a generic pseudoinstruction and remap the operands. This works by defining a new instruction, which will simply swap out the emitted entry in the InstrInfo table. This is intended to eliminate the C++ half of the implementation of PointerLikeRegClass. With RegClassByHwMode, the remaining usecase for PointerLikeRegClass are the common codegen pseudoinstructions. Every target maintains its own copy of the generic pseudo operand definitions anyway, so we can stub out the register operands with an appropriate class instead of waiting for runtime resolution. In the future we could probably take this a bit further. For example, there is a similar problem for ADJCALLSTACKUP/DOWN since they depend on target register definitions for the stack pointer register.

…168029) - Split from llvm#165532. This is a step toward a unified interface for masked/gather-scatter/strided/expand-compress cost modeling. - Replace the ad-hoc parameter list with a single attributes object. API change: ``` - InstructionCost getMaskedMemoryOpCost(Opcode, Src, Alignment, - AddressSpace, CostKind); + InstructionCost getMaskedMemoryOpCost(MemIntrinsicCostAttributes, + CostKind); ``` Notes: - NFCI intended: callers populate MemIntrinsicCostAttributes with the same information as before. - Follow-up: migrate gather/scatter, strided, and expand/compress cost queries to the same attributes-based entry point.

/llvm-project/llvm/utils/TableGen/Common/CodeGenTarget.cpp:286:12: error: variable 'SkippedInsts' set but not used [-Werror,-Wunused-but-set-variable] unsigned SkippedInsts = 0; ^ 1 error generated.

A folow-up of llvm#168458.

…#168498) This follows the list of names used by GCC.

…#159255) This patch improves the codegen for saddo on i32 and i64 in both 32-bit and 64-bit modes by custom lowering. It implements signed-add overflow detection using the `(x eqv y) & (sum xor x)`bit-level sequence.

This reverts commit d772663. This fixes the final issue with the labeller landing. There were two remaining issues: 1. There was an extra quote on one of the globs 2. Some of the yaml keys were named incorrectly (should have been plural)

…vm#156724) Consider skipping epilogue scalable VF when they are greater than RemainingIterations same as fixed VF. And skip scalable RemainingIterations from that comparison because SCEV ATM can't evaluate non-canonical vscale-based expressions.

…ins (llvm#168325) Main changes: * OpenCL legacy atom/atomic builtins now call CLC atomic functions (which use Clang __scoped_atomic_*), replacing previous Clang __sync_* functions. * Change memory order from seq_cst to relaxed; keep device scope (spec permits broader than workgroup). LLVM IR for _Z8atom_decPU3AS1Vi in amdgcn--amdhsa.bc: Before: %2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1 syncscope("agent") seq_cst After: %2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1 syncscope("agent") monotonic * Also adds OpenCL 1.0 atom_* variants without volatile on the pointer. They are added for backward compatibility.

This PR relax the create_mem_desc's restriction on source memref, allowing it to be a 2d memref.

Moves all Session member variables dedicated to shutdown into a new ShutdownInfo struct, and uses the presence / absence of this struct as the flag to indicate that we've entered the "shutting down" state. This simplifies the implementation of the shutdown process.

Explicitly cast 0 to size_t type to match fread() return type. This follows the pattern used elsewhere in this file, and fixes -Wshorten-64-to-32 warnings when building the test.

My 2020 change that added versioned symbol recognition (reviews.llvm.org/D80059) checks both VER_NDX_LOCAL and VER_NDX_GLOBAL, though test coverage was missing. lld/test/ELF/dso-undef-extract-lazy.s checks that the undefined symbol is indeed considered unversioned.

…#168381) As suggested in the review for llvm#160536 it would be good to follow up and port the RISC-V passes to the new pass manager. This PR starts that task. It provides the bare minimum necessary to run RISCVCodeGenPrepare with opt -passes=riscv-codegenprepare. The approach used is modeled on my observations of the AMDGPU backend and the recent work to port the X86 passes. The testing approach is to add a `-passes=riscv-foo` RUN line to at least one test, if an appropriate test exists.

This change adds intrinsics and clang builtins for the remaining float to fp16 conversions. This includes the following conversions: - float to bf16x2 - satfinite variants - float to f16x2 - satfinite variants - float to bf16 - satfinite variants - float to f16 - all variants Tests are added in `convert-sm80.ll` and `convert-sm80-sf.ll` for the intrinsics and in `builtins-nvptx.c` for the clang builtins.

This enables building LLVM with `-mllvm -x86-asm-syntax=intel` in one's Clang config files (i.e. a global preference for Intel syntax). `-masm=att` is insufficient as it doesn't override a specification of `-mllvm -x86-asm-syntax`.

…::initializeSubtargetDependencies. (llvm#168612) The "generic" entry in tablegen is really a dummy entry. We shouldn't use it for anything. Remap "generic" to either generic-rv32 or generic-rv64 based on the triple.

This document aims to lay out the high level design and goals of the ORC runtime, and the relationships between key components.

This issue was introduced in llvm#167689

…ion (llvm#168500) Do not add latency for wavefront and singlethread scope fences during barrier latency DAG mutation. These scopes do not typically introduce any latency and adjusting schedules based on them significantly impacts latency hiding.

…8638) Reland commit fb829bf with additional fixes relating to post-merge CI failure ``` /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp: In function ‘constexpr llvm::nvvm::CTAGroupKind getNVVMCtaGroupKind(mlir::NVVM::CTAGroupKind)’: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/llvm/include/llvm/Support/ErrorHandling.h:165:36: error: call to non-constexpr function ‘void llvm::llvm_unreachable_internal(const char*, const char*, unsigned int)’ ::llvm::llvm_unreachable_internal(msg, __FILE__, __LINE__) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp:73:3: note: in expansion of macro ‘llvm_unreachable’ llvm_unreachable("unsupported cta_group value"); ^ ```

… (llvm#168581) This commit fixes linker failures evident on some failing build bots.

EPCDebugObjectRegistrar is unused now that the ELF debugger support plugin uses AllocActions llvm#167866

…-casting (llvm#167655)

Use the recently refactored VPRecipeBase::print to print debug location for all recipes. PR: llvm#168454

…lvm#168010) Fixes llvm#167710

…lvm#165427) For flat memory instructions where the address is supplied as a base address register with an immediate offset, the memory aperture test ignores the immediate offset. Currently, SDISel does not respect that, which leads to miscompilations where valid input programs crash when the address computation relies on the immediate offset to get the base address in the proper memory aperture. Global or scratch instructions are not affected. This patch only selects flat instructions with immediate offsets from PTRADD address computations with the inbounds flag: If the PTRADD does not leave the bounds of the allocated object, it cannot leave the bounds of the memory aperture and is therefore safe to handle with an immediate offset. Affected tests: - CodeGen/AMDGPU/fold-gep-offset.ll: Offsets are no longer wrongly folded, added new positive tests where we still do fold them. - CodeGen/AMDGPU/infer-addrspace-flat-atomic.ll: Offset folding doesn't seem integral to this test, so the test is not changed to make offset folding still happen. - CodeGen/AMDGPU/loop-prefetch-data.ll: loop-reduce transforms inbounds addresses for accesses to be based on potentially OOB addresses used for prefetching. - I think the remaining ones suffer from the limited preservation of the inbounds flag in PTRADD DAGCombines due to the provenance problems pointed out in PR llvm#165424 and the fact that `AMDGPUTargetLowering::SplitVector{Load|Store}` legalizes too-wide accesses by repeatedly splitting them in half. Legalizing a V32S32 memory accesses therefore leads to inbounds ptradd chains like (ptradd inbounds (ptradd inbounds (ptradd inbounds P, 64), 32), 16). The DAGCombines fold them into a single ptradd, but the involved transformations generally cannot preserve the inbounds flag (even though it would be valid in this case). Similar previous PR that relied on `ISD::ADD inbounds` instead of `ISD::PTRADD inbounds` (closed): llvm#132353 Analogous PR for GISel (merged): llvm#153001 Fixes SWDEV-516125.

The option -falloc-token-max=0 is supposed to be usable to override previous settings back to the target default max tokens (SIZE_MAX). This did not work for the builtin: ``` | executed command: clang -cc1 [..] -nostdsysteminc -triple x86_64-linux-gnu -std=c++23 -fsyntax-only -verify clang/test/SemaCXX/alloc-token.cpp -falloc-token-max=0 | clang: llvm/lib/Support/AllocToken.cpp:38: std::optional<uint64_t> llvm::getAllocToken(AllocTokenMode, const AllocTokenMetadata &, uint64_t): Assertion `MaxTokens && "Must provide non-zero max tokens"' failed. ``` Fix it by also picking the default if "0" is passed. Improve the documentation to be clearer what the value of "0" means.

…8694) EltsFromConsecutiveLoads can be recursively called - ensure we limit the recursion depth.

…llvm#168044) Resolves llvm#166529

Adding "use-after-return" in Lifetime Analysis. Detecting when a function returns a reference to its own stack memory: [UAR Design Doc](https://docs.google.com/document/d/1Wxjn_rJD_tuRdejP81dlb9VOckTkCq5-aE1nGcerb_o/edit?usp=sharing) Consider the following example: ```cpp std::string_view foo() { std::string_view a; std::string str = "small scoped string"; a = str; return a; } ``` The code adds a new Fact "OriginEscape" in the end of the CFG to determine any loan that is escaping the function as shown below: ``` Function: foo Block B2: End of Block Block B1: OriginFlow (Dest: 0 (Decl: a), Src: 1 (Expr: CXXConstructExpr)) OriginFlow (Dest: 2 (Expr: ImplicitCastExpr), Src: 3 (Expr: StringLiteral)) Issue (0 (Path: operator=), ToOrigin: 4 (Expr: DeclRefExpr)) OriginFlow (Dest: 5 (Expr: ImplicitCastExpr), Src: 4 (Expr: DeclRefExpr)) Use (0 (Decl: a), Write) Issue (1 (Path: str), ToOrigin: 6 (Expr: DeclRefExpr)) OriginFlow (Dest: 7 (Expr: ImplicitCastExpr), Src: 6 (Expr: DeclRefExpr)) OriginFlow (Dest: 8 (Expr: CXXMemberCallExpr), Src: 7 (Expr: ImplicitCastExpr)) OriginFlow (Dest: 9 (Expr: ImplicitCastExpr), Src: 8 (Expr: CXXMemberCallExpr)) OriginFlow (Dest: 10 (Expr: ImplicitCastExpr), Src: 9 (Expr: ImplicitCastExpr)) OriginFlow (Dest: 11 (Expr: MaterializeTemporaryExpr), Src: 10 (Expr: ImplicitCastExpr)) OriginFlow (Dest: 0 (Decl: a), Src: 11 (Expr: MaterializeTemporaryExpr)) Use (0 (Decl: a), Read) OriginFlow (Dest: 12 (Expr: ImplicitCastExpr), Src: 0 (Decl: a)) OriginFlow (Dest: 13 (Expr: CXXConstructExpr), Src: 12 (Expr: ImplicitCastExpr)) Expire (1 (Path: str)) OriginEscapes (13 (Expr: CXXConstructExpr)) End of Block Block B0: End of Block ``` The confidence of the report is determined by checking if at least one of the loans returned is not expired (strict). If all loans are expired it is considered permissive. More information [UAR Design Doc](https://docs.google.com/document/d/1Wxjn_rJD_tuRdejP81dlb9VOckTkCq5-aE1nGcerb_o/edit?usp=sharing)

…ldUnmerge (llvm#168692) This aims to fix the crash in llvm#168495, my combine rule was missing a check that the source vector was in fact a vector. This then caused the legality check to fail in this example as the concat was trying to concat a non vector. I have also gated the bitcast of the concat to only work on non-scalable vectors as the mutation calls `getNumElements` which crashes when called on a scalable vector. Fixes llvm#168495

Per Shore: revert locally, he will reapply This reverts commit 52a58a4.

Implement VPHeaderPHIRecipe::classof(const VPValue *V) in terms of the variant taking VPRecipeBase. Reduces some duplication, split off from llvm#141431.

Post-commit fix of llvm#164794 reported at llvm#164794 (comment) `LLVM_LIBRARY_OUTPUT_INTDIR` and `LLVM_RUNTIME_OUTPUT_INTDIR` is used by `AddLLVM.cmake` as output directories. Unless we are in a bootstrapping-build, It must not point to directories found by `find_package(LLVM)` which may be read-only directories. MLIR for instance sets thesese variables to its own build output directory, so should the runtimes.

z1-cciauto · 2025-11-19T13:31:33Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2868

tromey and others added 30 commits November 18, 2025 22:59

Introduce DwarfUnit::addBlock helper method (llvm#168446)

1262acf

This patch is just a small cleanup that unifies the various spots that add a DWARF expression to the output.

[orc-rt] Fix typos in file comments.

411c752

Fix llvm#168367 (llvm#168635)

651785a

workflows/release-binaries: Drop install-ninja action (llvm#167070)

7819071

ninja is already installed by default on Linux and macOS.

[NFC][TableGen] Adopt CodeGenHelpers in CodeGenMapTable (llvm#168592)

c32d2ee

Adopt `IfDefEmitter` and `NamespaceEmitter` in CodeGenMapTable.cpp

[LTT] Mark as unkown weak function tests. (llvm#167399)

88efd0e

We don't have enough information to infer the probability of a weak function pointer being nullptr or not (open question if we could propagate this from the linker) Issue llvm#147390

[OpenACC] add cl::values to ACCImplicitRoutineOptions (llvm#168601)

9a15556

Add the cl::values to the pass options so an assert is not reached when trying to generate a reproducer e.g. "unknown data value for option"

[TableGen] Silence a warning (NFC)

a3ab110

/llvm-project/llvm/utils/TableGen/Common/CodeGenTarget.cpp:286:12: error: variable 'SkippedInsts' set but not used [-Werror,-Wunused-but-set-variable] unsigned SkippedInsts = 0; ^ 1 error generated.

[NFC] Check operand type instead of opcode (llvm#168641)

b4aa3d3

A folow-up of llvm#168458.

[AMDGPU] Adding instruction specific features (llvm#167809)

52a58a4

[AMDGPU][GlobalISel] Add regbankselect rules for G_FSHR (llvm#159818)

5ee95f4

[SPARC][clang] Add condition code register names for inline asm (llvm…

52ed0f2

…#168498) This follows the list of names used by GCC.

merge main into amd-staging

4a24947

[MLIR][XeGPU] Allow create mem desc from 2d memref (llvm#167767)

9dc4ebf

This PR relax the create_mem_desc's restriction on source memref, allowing it to be a 2d memref.

[libc] Fix -Wshorten-64-to-32 in fileop_test. (llvm#168451)

5bba4fd

Explicitly cast 0 to size_t type to match fread() return type. This follows the pattern used elsewhere in this file, and fixes -Wshorten-64-to-32 warnings when building the test.

Exclude from profcheck a vplan test under phase ordering (llvm#168669)

5109f2a

[MLIR][NVVM] Move docs to correct folder

669c30c

topperc and others added 23 commits November 19, 2025 00:08

[mlir] Use dictionary order to order the pass decl (NFC) (llvm#168648)

f8e83c4

[orc-rt] Initial ORC Runtime design documentation. (llvm#168681)

de9c182

This document aims to lay out the high level design and goals of the ORC runtime, and the relationships between key components.

[clang-tidy][NFC] Fix order in list.rst (llvm#168683)

fddfc70

This issue was introduced in llvm#167689

[mlir][tosa] Fix linker failure in build bots introduced by llvm#165581…

a2af185

… (llvm#168581) This commit fixes linker failures evident on some failing build bots.

[ORC] Remove now unused EPCDebugObjectRegistrar (NFC) (llvm#167868)

907e851

EPCDebugObjectRegistrar is unused now that the ELF debugger support plugin uses AllocActions llvm#167866

[clang-tidy] Provide fix-its for casts to void* in google-readability…

915e9ad

…-casting (llvm#167655)

[VPlan] Print debug info for all recipes. (llvm#168454)

0730913

Use the recently refactored VPRecipeBase::print to print debug location for all recipes. PR: llvm#168454

[DAG] Update canCreateUndefOrPoison to handle ISD::VECTOR_COMPRESS (l…

e38529d

…lvm#168010) Fixes llvm#167710

[MLIR][NVVM] Add operations and interfaces

2f6a8a7

[SPARC][NFC] Move clang tests into own subdirectory (llvm#168657)

ed7f2a4

[X86] EltsFromConsecutiveLoads - add recursion depth limiter (llvm#16…

b42851b

…8694) EltsFromConsecutiveLoads can be recursively called - ensure we limit the recursion depth.

[Clang][X86] allow VPERMILPD/S imm intrinsics to be used in constexpr (…

50791c3

…llvm#168044) Resolves llvm#166529

Revert "[AMDGPU] Adding instruction specific features (llvm#167809)"

3d3844f

Per Shore: revert locally, he will reapply This reverts commit 52a58a4.

[VPLan] Reduce duplication in VPHeaderPHIRecipe::classof. (NFCI)

7b94dd3

Implement VPHeaderPHIRecipe::classof(const VPValue *V) in terms of the variant taking VPRecipeBase. Reduces some duplication, split off from llvm#141431.

merge main into amd-staging

d59d11f

ronlieb requested review from a team and dpalermo November 19, 2025 13:30

dpalermo approved these changes Nov 19, 2025

View reviewed changes

z1-cciauto merged commit 94a627f into amd-staging Nov 19, 2025
24 checks passed

z1-cciauto deleted the amd/merge/upstream_merge_20251119065220 branch November 19, 2025 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #624

merge main into amd-staging #624

Uh oh!

ronlieb commented Nov 19, 2025

Uh oh!

z1-cciauto commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

47 participants

merge main into amd-staging #624

merge main into amd-staging #624

Uh oh!

Conversation

ronlieb commented Nov 19, 2025

Uh oh!

z1-cciauto commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

47 participants