forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 77
merge main into amd-staging #624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
z1-cciauto
merged 53 commits into
amd-staging
from
amd/merge/upstream_merge_20251119065220
Nov 19, 2025
Merged
merge main into amd-staging #624
z1-cciauto
merged 53 commits into
amd-staging
from
amd/merge/upstream_merge_20251119065220
Nov 19, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This patch is just a small cleanup that unifies the various spots that add a DWARF expression to the output.
) Introduce two new OpenACC operation interfaces for identifying global variables and their address computations: - `GlobalVariableOpInterface`: Identifies operations that define global variables. Provides an `isConstant()` method to query whether the global is constant. - `AddressOfGlobalOpInterface`: Identifies operations that compute the address of a global variable. Provides a `getSymbol()` method to retrieve the symbol reference. This is being done in preparation for `ACCImplicitDeclare` pass which will automatically ensure that `acc declare` is applied to globals when needed. The following operations now implement these interfaces: - `memref::GlobalOp` implements `GlobalVariableOpInterface` - `memref::GetGlobalOp` implements `AddressOfGlobalOpInterface` - `fir::GlobalOp` implements `GlobalVariableOpInterface` - `fir::AddrOfOp` implements `AddressOfGlobalOpInterface`
ninja is already installed by default on Linux and macOS.
Adopt `IfDefEmitter` and `NamespaceEmitter` in CodeGenMapTable.cpp
We don't have enough information to infer the probability of a weak function pointer being nullptr or not (open question if we could propagate this from the linker) Issue llvm#147390
Add the cl::values to the pass options so an assert is not reached when trying to generate a reproducer e.g. "unknown data value for option"
This commit adds the below fence intrinsics: - llvm.nvvm.fence.acquire.sync_restrict.space.cluster.scope.cluster - llvm.nvvm.fence.release.sync_restrict.space.cta.scope.cluster - llvm.nvvm.fence.mbarrier_init.release.cluster - llvm.nvvm.fence.proxy.async.generic.acquire.sync_restrict.space.cluster.scope.cluster - llvm.nvvm.fence.proxy.async.generic.release.sync_restrict.space.cta.scope.cluster llvm.nvvm.fence.proxy.alias - llvm.nvvm.fence.proxy.async - llvm.nvvm.fence.proxy.async.global - llvm.nvvm.fence.proxy.async.shared_cluster - llvm.nvvm.fence.proxy.async.shared_cta For more information, please refere the [PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar)
Allow a target to steal the definition of a generic pseudoinstruction and remap the operands. This works by defining a new instruction, which will simply swap out the emitted entry in the InstrInfo table. This is intended to eliminate the C++ half of the implementation of PointerLikeRegClass. With RegClassByHwMode, the remaining usecase for PointerLikeRegClass are the common codegen pseudoinstructions. Every target maintains its own copy of the generic pseudo operand definitions anyway, so we can stub out the register operands with an appropriate class instead of waiting for runtime resolution. In the future we could probably take this a bit further. For example, there is a similar problem for ADJCALLSTACKUP/DOWN since they depend on target register definitions for the stack pointer register.
…168029) - Split from llvm#165532. This is a step toward a unified interface for masked/gather-scatter/strided/expand-compress cost modeling. - Replace the ad-hoc parameter list with a single attributes object. API change: ``` - InstructionCost getMaskedMemoryOpCost(Opcode, Src, Alignment, - AddressSpace, CostKind); + InstructionCost getMaskedMemoryOpCost(MemIntrinsicCostAttributes, + CostKind); ``` Notes: - NFCI intended: callers populate MemIntrinsicCostAttributes with the same information as before. - Follow-up: migrate gather/scatter, strided, and expand/compress cost queries to the same attributes-based entry point.
/llvm-project/llvm/utils/TableGen/Common/CodeGenTarget.cpp:286:12:
error: variable 'SkippedInsts' set but not used [-Werror,-Wunused-but-set-variable]
unsigned SkippedInsts = 0;
^
1 error generated.
…#168498) This follows the list of names used by GCC.
…#159255) This patch improves the codegen for saddo on i32 and i64 in both 32-bit and 64-bit modes by custom lowering. It implements signed-add overflow detection using the `(x eqv y) & (sum xor x)`bit-level sequence.
This reverts commit d772663. This fixes the final issue with the labeller landing. There were two remaining issues: 1. There was an extra quote on one of the globs 2. Some of the yaml keys were named incorrectly (should have been plural)
…vm#156724) Consider skipping epilogue scalable VF when they are greater than RemainingIterations same as fixed VF. And skip scalable RemainingIterations from that comparison because SCEV ATM can't evaluate non-canonical vscale-based expressions.
…ins (llvm#168325) Main changes: * OpenCL legacy atom/atomic builtins now call CLC atomic functions (which use Clang __scoped_atomic_*), replacing previous Clang __sync_* functions. * Change memory order from seq_cst to relaxed; keep device scope (spec permits broader than workgroup). LLVM IR for _Z8atom_decPU3AS1Vi in amdgcn--amdhsa.bc: Before: %2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1 syncscope("agent") seq_cst After: %2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1 syncscope("agent") monotonic * Also adds OpenCL 1.0 atom_* variants without volatile on the pointer. They are added for backward compatibility.
This PR relax the create_mem_desc's restriction on source memref, allowing it to be a 2d memref.
Moves all Session member variables dedicated to shutdown into a new ShutdownInfo struct, and uses the presence / absence of this struct as the flag to indicate that we've entered the "shutting down" state. This simplifies the implementation of the shutdown process.
Explicitly cast 0 to size_t type to match fread() return type. This follows the pattern used elsewhere in this file, and fixes -Wshorten-64-to-32 warnings when building the test.
My 2020 change that added versioned symbol recognition (reviews.llvm.org/D80059) checks both VER_NDX_LOCAL and VER_NDX_GLOBAL, though test coverage was missing. lld/test/ELF/dso-undef-extract-lazy.s checks that the undefined symbol is indeed considered unversioned.
…#168381) As suggested in the review for llvm#160536 it would be good to follow up and port the RISC-V passes to the new pass manager. This PR starts that task. It provides the bare minimum necessary to run RISCVCodeGenPrepare with opt -passes=riscv-codegenprepare. The approach used is modeled on my observations of the AMDGPU backend and the recent work to port the X86 passes. The testing approach is to add a `-passes=riscv-foo` RUN line to at least one test, if an appropriate test exists.
This change adds intrinsics and clang builtins for the remaining float to fp16 conversions. This includes the following conversions: - float to bf16x2 - satfinite variants - float to f16x2 - satfinite variants - float to bf16 - satfinite variants - float to f16 - all variants Tests are added in `convert-sm80.ll` and `convert-sm80-sf.ll` for the intrinsics and in `builtins-nvptx.c` for the clang builtins.
This enables building LLVM with `-mllvm -x86-asm-syntax=intel` in one's Clang config files (i.e. a global preference for Intel syntax). `-masm=att` is insufficient as it doesn't override a specification of `-mllvm -x86-asm-syntax`.
…::initializeSubtargetDependencies. (llvm#168612) The "generic" entry in tablegen is really a dummy entry. We shouldn't use it for anything. Remap "generic" to either generic-rv32 or generic-rv64 based on the triple.
This document aims to lay out the high level design and goals of the ORC runtime, and the relationships between key components.
This issue was introduced in llvm#167689
…ion (llvm#168500) Do not add latency for wavefront and singlethread scope fences during barrier latency DAG mutation. These scopes do not typically introduce any latency and adjusting schedules based on them significantly impacts latency hiding.
…8638) Reland commit fb829bf with additional fixes relating to post-merge CI failure ``` /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp: In function ‘constexpr llvm::nvvm::CTAGroupKind getNVVMCtaGroupKind(mlir::NVVM::CTAGroupKind)’: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/llvm/include/llvm/Support/ErrorHandling.h:165:36: error: call to non-constexpr function ‘void llvm::llvm_unreachable_internal(const char*, const char*, unsigned int)’ ::llvm::llvm_unreachable_internal(msg, __FILE__, __LINE__) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp:73:3: note: in expansion of macro ‘llvm_unreachable’ llvm_unreachable("unsupported cta_group value"); ^ ```
… (llvm#168581) This commit fixes linker failures evident on some failing build bots.
EPCDebugObjectRegistrar is unused now that the ELF debugger support plugin uses AllocActions llvm#167866
Use the recently refactored VPRecipeBase::print to print debug location for all recipes. PR: llvm#168454
…lvm#165427) For flat memory instructions where the address is supplied as a base address register with an immediate offset, the memory aperture test ignores the immediate offset. Currently, SDISel does not respect that, which leads to miscompilations where valid input programs crash when the address computation relies on the immediate offset to get the base address in the proper memory aperture. Global or scratch instructions are not affected. This patch only selects flat instructions with immediate offsets from PTRADD address computations with the inbounds flag: If the PTRADD does not leave the bounds of the allocated object, it cannot leave the bounds of the memory aperture and is therefore safe to handle with an immediate offset. Affected tests: - CodeGen/AMDGPU/fold-gep-offset.ll: Offsets are no longer wrongly folded, added new positive tests where we still do fold them. - CodeGen/AMDGPU/infer-addrspace-flat-atomic.ll: Offset folding doesn't seem integral to this test, so the test is not changed to make offset folding still happen. - CodeGen/AMDGPU/loop-prefetch-data.ll: loop-reduce transforms inbounds addresses for accesses to be based on potentially OOB addresses used for prefetching. - I think the remaining ones suffer from the limited preservation of the inbounds flag in PTRADD DAGCombines due to the provenance problems pointed out in PR llvm#165424 and the fact that `AMDGPUTargetLowering::SplitVector{Load|Store}` legalizes too-wide accesses by repeatedly splitting them in half. Legalizing a V32S32 memory accesses therefore leads to inbounds ptradd chains like (ptradd inbounds (ptradd inbounds (ptradd inbounds P, 64), 32), 16). The DAGCombines fold them into a single ptradd, but the involved transformations generally cannot preserve the inbounds flag (even though it would be valid in this case). Similar previous PR that relied on `ISD::ADD inbounds` instead of `ISD::PTRADD inbounds` (closed): llvm#132353 Analogous PR for GISel (merged): llvm#153001 Fixes SWDEV-516125.
The option -falloc-token-max=0 is supposed to be usable to override previous settings back to the target default max tokens (SIZE_MAX). This did not work for the builtin: ``` | executed command: clang -cc1 [..] -nostdsysteminc -triple x86_64-linux-gnu -std=c++23 -fsyntax-only -verify clang/test/SemaCXX/alloc-token.cpp -falloc-token-max=0 | clang: llvm/lib/Support/AllocToken.cpp:38: std::optional<uint64_t> llvm::getAllocToken(AllocTokenMode, const AllocTokenMetadata &, uint64_t): Assertion `MaxTokens && "Must provide non-zero max tokens"' failed. ``` Fix it by also picking the default if "0" is passed. Improve the documentation to be clearer what the value of "0" means.
…8694) EltsFromConsecutiveLoads can be recursively called - ensure we limit the recursion depth.
Adding "use-after-return" in Lifetime Analysis. Detecting when a function returns a reference to its own stack memory: [UAR Design Doc](https://docs.google.com/document/d/1Wxjn_rJD_tuRdejP81dlb9VOckTkCq5-aE1nGcerb_o/edit?usp=sharing) Consider the following example: ```cpp std::string_view foo() { std::string_view a; std::string str = "small scoped string"; a = str; return a; } ``` The code adds a new Fact "OriginEscape" in the end of the CFG to determine any loan that is escaping the function as shown below: ``` Function: foo Block B2: End of Block Block B1: OriginFlow (Dest: 0 (Decl: a), Src: 1 (Expr: CXXConstructExpr)) OriginFlow (Dest: 2 (Expr: ImplicitCastExpr), Src: 3 (Expr: StringLiteral)) Issue (0 (Path: operator=), ToOrigin: 4 (Expr: DeclRefExpr)) OriginFlow (Dest: 5 (Expr: ImplicitCastExpr), Src: 4 (Expr: DeclRefExpr)) Use (0 (Decl: a), Write) Issue (1 (Path: str), ToOrigin: 6 (Expr: DeclRefExpr)) OriginFlow (Dest: 7 (Expr: ImplicitCastExpr), Src: 6 (Expr: DeclRefExpr)) OriginFlow (Dest: 8 (Expr: CXXMemberCallExpr), Src: 7 (Expr: ImplicitCastExpr)) OriginFlow (Dest: 9 (Expr: ImplicitCastExpr), Src: 8 (Expr: CXXMemberCallExpr)) OriginFlow (Dest: 10 (Expr: ImplicitCastExpr), Src: 9 (Expr: ImplicitCastExpr)) OriginFlow (Dest: 11 (Expr: MaterializeTemporaryExpr), Src: 10 (Expr: ImplicitCastExpr)) OriginFlow (Dest: 0 (Decl: a), Src: 11 (Expr: MaterializeTemporaryExpr)) Use (0 (Decl: a), Read) OriginFlow (Dest: 12 (Expr: ImplicitCastExpr), Src: 0 (Decl: a)) OriginFlow (Dest: 13 (Expr: CXXConstructExpr), Src: 12 (Expr: ImplicitCastExpr)) Expire (1 (Path: str)) OriginEscapes (13 (Expr: CXXConstructExpr)) End of Block Block B0: End of Block ``` The confidence of the report is determined by checking if at least one of the loans returned is not expired (strict). If all loans are expired it is considered permissive. More information [UAR Design Doc](https://docs.google.com/document/d/1Wxjn_rJD_tuRdejP81dlb9VOckTkCq5-aE1nGcerb_o/edit?usp=sharing)
…ldUnmerge (llvm#168692) This aims to fix the crash in llvm#168495, my combine rule was missing a check that the source vector was in fact a vector. This then caused the legality check to fail in this example as the concat was trying to concat a non vector. I have also gated the bitcast of the concat to only work on non-scalable vectors as the mutation calls `getNumElements` which crashes when called on a scalable vector. Fixes llvm#168495
Per Shore: revert locally, he will reapply This reverts commit 52a58a4.
Implement VPHeaderPHIRecipe::classof(const VPValue *V) in terms of the variant taking VPRecipeBase. Reduces some duplication, split off from llvm#141431.
Post-commit fix of llvm#164794 reported at llvm#164794 (comment) `LLVM_LIBRARY_OUTPUT_INTDIR` and `LLVM_RUNTIME_OUTPUT_INTDIR` is used by `AddLLVM.cmake` as output directories. Unless we are in a bootstrapping-build, It must not point to directories found by `find_package(LLVM)` which may be read-only directories. MLIR for instance sets thesese variables to its own build output directory, so should the runtimes.
Collaborator
dpalermo
approved these changes
Nov 19, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.