Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Nov 19, 2025

No description provided.

tromey and others added 30 commits November 18, 2025 22:59
This patch is just a small cleanup that unifies the various spots that
add a DWARF expression to the output.
)

Introduce two new OpenACC operation interfaces for identifying global
variables and their address computations:

- `GlobalVariableOpInterface`: Identifies operations that define global
variables. Provides an `isConstant()` method to query whether the global
is constant.

- `AddressOfGlobalOpInterface`: Identifies operations that compute the
address of a global variable. Provides a `getSymbol()` method to
retrieve the symbol reference.

This is being done in preparation for `ACCImplicitDeclare` pass which
will automatically ensure that `acc declare` is applied to globals when
needed.

The following operations now implement these interfaces:
- `memref::GlobalOp` implements `GlobalVariableOpInterface`
- `memref::GetGlobalOp` implements `AddressOfGlobalOpInterface`
- `fir::GlobalOp` implements `GlobalVariableOpInterface`
- `fir::AddrOfOp` implements `AddressOfGlobalOpInterface`
ninja is already installed by default on Linux and macOS.
Adopt `IfDefEmitter` and `NamespaceEmitter` in CodeGenMapTable.cpp
We don't have enough information to infer the probability of a weak function pointer being nullptr or not (open question if we could propagate this from the linker)

Issue llvm#147390
Add the cl::values to the pass options so an assert is not reached when
trying to generate a reproducer e.g. "unknown data value for option"
This commit adds the below fence intrinsics:

- llvm.nvvm.fence.acquire.sync_restrict.space.cluster.scope.cluster
- llvm.nvvm.fence.release.sync_restrict.space.cta.scope.cluster
- llvm.nvvm.fence.mbarrier_init.release.cluster
-
llvm.nvvm.fence.proxy.async.generic.acquire.sync_restrict.space.cluster.scope.cluster
-
llvm.nvvm.fence.proxy.async.generic.release.sync_restrict.space.cta.scope.cluster
llvm.nvvm.fence.proxy.alias
- llvm.nvvm.fence.proxy.async
- llvm.nvvm.fence.proxy.async.global
- llvm.nvvm.fence.proxy.async.shared_cluster
- llvm.nvvm.fence.proxy.async.shared_cta

For more information, please refere the [PTX
ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar)
Allow a target to steal the definition of a generic pseudoinstruction
and remap the operands. This works by defining a new instruction, which
will simply swap out the emitted entry in the InstrInfo table.

This is intended to eliminate the C++ half of the implementation
of PointerLikeRegClass. With RegClassByHwMode, the remaining usecase
for PointerLikeRegClass are the common codegen pseudoinstructions.
Every target maintains its own copy of the generic pseudo operand
definitions anyway, so we can stub out the register operands with
an appropriate class instead of waiting for runtime resolution.

In the future we could probably take this a bit further. For example,
there is a similar problem for ADJCALLSTACKUP/DOWN since they depend
on target register definitions for the stack pointer register.
…168029)

- Split from llvm#165532. This is a step toward a unified interface for
masked/gather-scatter/strided/expand-compress cost modeling.
- Replace the ad-hoc parameter list with a single attributes object.

API change:
```
- InstructionCost getMaskedMemoryOpCost(Opcode, Src, Alignment,
-                                       AddressSpace, CostKind);

+ InstructionCost getMaskedMemoryOpCost(MemIntrinsicCostAttributes,
+                                       CostKind);
```
Notes:
- NFCI intended: callers populate MemIntrinsicCostAttributes with the
same information as before.
- Follow-up: migrate gather/scatter, strided, and expand/compress cost
queries to the same attributes-based entry point.
/llvm-project/llvm/utils/TableGen/Common/CodeGenTarget.cpp:286:12:
 error: variable 'SkippedInsts' set but not used [-Werror,-Wunused-but-set-variable]
  unsigned SkippedInsts = 0;
           ^
1 error generated.
…#159255)

This patch improves the codegen for saddo on i32 and i64 in both 32-bit
and 64-bit modes by custom lowering. It implements signed-add overflow
detection using the `(x eqv y) & (sum xor x)`bit-level sequence.
This reverts commit d772663.

This fixes the final issue with the labeller landing. There were
two remaining issues:
1. There was an extra quote on one of the globs
2. Some of the yaml keys were named incorrectly (should have been
   plural)
…vm#156724)

Consider skipping epilogue scalable VF when they are greater than
RemainingIterations same as fixed VF.
And skip scalable RemainingIterations from that comparison because
SCEV ATM can't evaluate non-canonical vscale-based expressions.
…ins (llvm#168325)

Main changes:
* OpenCL legacy atom/atomic builtins now call CLC atomic functions
(which use Clang __scoped_atomic_*), replacing previous Clang __sync_*
functions.
* Change memory order from seq_cst to relaxed; keep device scope (spec
permits broader than workgroup). LLVM IR for _Z8atom_decPU3AS1Vi in
amdgcn--amdhsa.bc:
  Before:
%2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1
syncscope("agent") seq_cst
  After:
%2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1
syncscope("agent") monotonic
* Also adds OpenCL 1.0 atom_* variants without volatile on the pointer.
They are added for backward compatibility.
This PR relax the create_mem_desc's restriction on source memref,
allowing it to be a 2d memref.
Moves all Session member variables dedicated to shutdown into a new
ShutdownInfo struct, and uses the presence / absence of this struct as
the flag to indicate that we've entered the "shutting down" state. This
simplifies the implementation of the shutdown process.
Explicitly cast 0 to size_t type to match fread() return type. This
follows the pattern used elsewhere in this file, and fixes
-Wshorten-64-to-32 warnings when building the test.
My 2020 change that added versioned symbol recognition
(reviews.llvm.org/D80059) checks both VER_NDX_LOCAL and VER_NDX_GLOBAL,
though test coverage was missing. lld/test/ELF/dso-undef-extract-lazy.s
checks that the undefined symbol is indeed considered unversioned.
…#168381)

As suggested in the review for llvm#160536 it would be good to follow up and
port the RISC-V passes to the new pass manager. This PR starts that
task. It provides the bare minimum necessary to run RISCVCodeGenPrepare
with opt -passes=riscv-codegenprepare. The approach used is modeled on
my observations of the AMDGPU backend and the recent work to port the
X86 passes.

The testing approach is to add a `-passes=riscv-foo` RUN line to at
least one test, if an appropriate test exists.
This change adds intrinsics and clang builtins for the remaining float
to fp16 conversions. This includes the following conversions:

- float to bf16x2 - satfinite variants
- float to f16x2 - satfinite variants
- float to bf16 - satfinite variants
- float to f16 - all variants

Tests are added in `convert-sm80.ll` and `convert-sm80-sf.ll` for the
intrinsics and in `builtins-nvptx.c` for the clang builtins.
This enables building LLVM with `-mllvm -x86-asm-syntax=intel` in one's
Clang config files (i.e. a global preference for Intel syntax).

`-masm=att` is insufficient as it doesn't override a specification of `-mllvm -x86-asm-syntax`.
topperc and others added 23 commits November 19, 2025 00:08
…::initializeSubtargetDependencies. (llvm#168612)

The "generic" entry in tablegen is really a dummy entry. We shouldn't
use it for anything. Remap "generic" to either generic-rv32 or
generic-rv64 based on the triple.
This document aims to lay out the high level design and goals of the ORC
runtime, and the relationships between key components.
…ion (llvm#168500)

Do not add latency for wavefront and singlethread scope fences during
barrier latency DAG mutation.
These scopes do not typically introduce any latency and adjusting
schedules based on them significantly impacts latency hiding.
…8638)

Reland commit fb829bf with additional fixes relating to post-merge CI failure

```
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp: In function ‘constexpr llvm::nvvm::CTAGroupKind getNVVMCtaGroupKind(mlir::NVVM::CTAGroupKind)’:
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/llvm/include/llvm/Support/ErrorHandling.h:165:36: error: call to non-constexpr function ‘void llvm::llvm_unreachable_internal(const char*, const char*, unsigned int)’
   ::llvm::llvm_unreachable_internal(msg, __FILE__, __LINE__)
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp:73:3: note: in expansion of macro ‘llvm_unreachable’
   llvm_unreachable("unsupported cta_group value");
   ^
```
… (llvm#168581)

This commit fixes linker failures evident on some failing build bots.
EPCDebugObjectRegistrar is unused now that the ELF debugger support plugin uses AllocActions
llvm#167866
Use the recently refactored VPRecipeBase::print to print debug location
for all recipes.

PR: llvm#168454
…lvm#165427)

For flat memory instructions where the address is supplied as a base address
register with an immediate offset, the memory aperture test ignores the
immediate offset. Currently, SDISel does not respect that, which leads to
miscompilations where valid input programs crash when the address computation
relies on the immediate offset to get the base address in the proper memory
aperture. Global or scratch instructions are not affected.

This patch only selects flat instructions with immediate offsets from PTRADD
address computations with the inbounds flag: If the PTRADD does not leave the
bounds of the allocated object, it cannot leave the bounds of the memory
aperture and is therefore safe to handle with an immediate offset.

Affected tests:

- CodeGen/AMDGPU/fold-gep-offset.ll: Offsets are no longer wrongly folded, added
  new positive tests where we still do fold them.
- CodeGen/AMDGPU/infer-addrspace-flat-atomic.ll: Offset folding doesn't seem
  integral to this test, so the test is not changed to make offset folding still
  happen.
- CodeGen/AMDGPU/loop-prefetch-data.ll: loop-reduce transforms inbounds
  addresses for accesses to be based on potentially OOB addresses used for
  prefetching.
- I think the remaining ones suffer from the limited preservation of the
  inbounds flag in PTRADD DAGCombines due to the provenance problems pointed out
  in PR llvm#165424 and the fact that
  `AMDGPUTargetLowering::SplitVector{Load|Store}` legalizes too-wide accesses by
  repeatedly splitting them in half.  Legalizing a V32S32 memory accesses
  therefore leads to inbounds ptradd chains like (ptradd inbounds (ptradd
  inbounds (ptradd inbounds P, 64), 32), 16). The DAGCombines fold them into a
  single ptradd, but the involved transformations generally cannot preserve the
  inbounds flag (even though it would be valid in this case).

Similar previous PR that relied on `ISD::ADD inbounds` instead of `ISD::PTRADD inbounds` (closed): llvm#132353
Analogous PR for GISel (merged): llvm#153001

Fixes SWDEV-516125.
The option -falloc-token-max=0 is supposed to be usable to override
previous settings back to the target default max tokens (SIZE_MAX).

This did not work for the builtin:
```
| executed command: clang -cc1 [..] -nostdsysteminc -triple x86_64-linux-gnu -std=c++23 -fsyntax-only -verify clang/test/SemaCXX/alloc-token.cpp -falloc-token-max=0
| clang: llvm/lib/Support/AllocToken.cpp:38: std::optional<uint64_t> llvm::getAllocToken(AllocTokenMode, const AllocTokenMetadata &, uint64_t): Assertion `MaxTokens && "Must provide non-zero max tokens"' failed.
```

Fix it by also picking the default if "0" is passed.

Improve the documentation to be clearer what the value of "0" means.
…8694)

EltsFromConsecutiveLoads can be recursively called - ensure we limit the recursion depth.
Adding "use-after-return" in Lifetime Analysis.

Detecting when a function returns a reference to its own stack memory:
[UAR Design
Doc](https://docs.google.com/document/d/1Wxjn_rJD_tuRdejP81dlb9VOckTkCq5-aE1nGcerb_o/edit?usp=sharing)

Consider the following example:

```cpp
std::string_view foo() {
    std::string_view a;
    std::string str = "small scoped string";
    a = str;
    return a;
}
```

The code adds a new Fact "OriginEscape" in the end of the CFG to
determine any loan that is escaping the function as shown below:

```
Function: foo
  Block B2:
  End of Block
  Block B1:
    OriginFlow (Dest: 0 (Decl: a), Src: 1 (Expr: CXXConstructExpr))
    OriginFlow (Dest: 2 (Expr: ImplicitCastExpr), Src: 3 (Expr: StringLiteral))
    Issue (0 (Path: operator=), ToOrigin: 4 (Expr: DeclRefExpr))
    OriginFlow (Dest: 5 (Expr: ImplicitCastExpr), Src: 4 (Expr: DeclRefExpr))
    Use (0 (Decl: a), Write)
    Issue (1 (Path: str), ToOrigin: 6 (Expr: DeclRefExpr))
    OriginFlow (Dest: 7 (Expr: ImplicitCastExpr), Src: 6 (Expr: DeclRefExpr))
    OriginFlow (Dest: 8 (Expr: CXXMemberCallExpr), Src: 7 (Expr: ImplicitCastExpr))
    OriginFlow (Dest: 9 (Expr: ImplicitCastExpr), Src: 8 (Expr: CXXMemberCallExpr))
    OriginFlow (Dest: 10 (Expr: ImplicitCastExpr), Src: 9 (Expr: ImplicitCastExpr))
    OriginFlow (Dest: 11 (Expr: MaterializeTemporaryExpr), Src: 10 (Expr: ImplicitCastExpr))
    OriginFlow (Dest: 0 (Decl: a), Src: 11 (Expr: MaterializeTemporaryExpr))
    Use (0 (Decl: a), Read)
    OriginFlow (Dest: 12 (Expr: ImplicitCastExpr), Src: 0 (Decl: a))
    OriginFlow (Dest: 13 (Expr: CXXConstructExpr), Src: 12 (Expr: ImplicitCastExpr))
    Expire (1 (Path: str))
    OriginEscapes (13 (Expr: CXXConstructExpr))
  End of Block
  Block B0:
  End of Block
```

The confidence of the report is determined by checking if at least one
of the loans returned is not expired (strict). If all loans are expired
it is considered permissive.

More information [UAR Design
Doc](https://docs.google.com/document/d/1Wxjn_rJD_tuRdejP81dlb9VOckTkCq5-aE1nGcerb_o/edit?usp=sharing)
…ldUnmerge (llvm#168692)

This aims to fix the crash in llvm#168495, my combine rule was
missing a check that the source vector was in fact a vector. This then
caused the legality check to fail in this example as the concat was
trying to concat a non vector.

I have also gated the bitcast of the concat to only work on non-scalable
vectors as the mutation calls `getNumElements` which crashes when called
on a scalable vector.

Fixes llvm#168495
Per Shore: revert locally, he will reapply

This reverts commit 52a58a4.
Implement VPHeaderPHIRecipe::classof(const VPValue *V)  in terms of the
variant taking VPRecipeBase.

Reduces some duplication, split off from
llvm#141431.
Post-commit fix of llvm#164794 reported at
llvm#164794 (comment)

`LLVM_LIBRARY_OUTPUT_INTDIR` and `LLVM_RUNTIME_OUTPUT_INTDIR` is used by
`AddLLVM.cmake` as output directories. Unless we are in a
bootstrapping-build, It must not point to directories found by
`find_package(LLVM)` which may be read-only directories. MLIR for
instance sets thesese variables to its own build output
directory, so should the runtimes.
@ronlieb ronlieb requested review from a team and dpalermo November 19, 2025 13:30
@z1-cciauto
Copy link
Collaborator

@z1-cciauto z1-cciauto merged commit 94a627f into amd-staging Nov 19, 2025
24 checks passed
@z1-cciauto z1-cciauto deleted the amd/merge/upstream_merge_20251119065220 branch November 19, 2025 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.