merge main into amd-staging #607

z1-cciauto · 2025-11-17T12:13:24Z

No description provided.

Identified with llvm-use-ranges.

This patch adds computeNumBuckets, a helper function to compute the number of buckets. This is part of the effort outlined in llvm#168255. This makes it easier to move the core logic of grow() to DenseMapBase::grow().

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [aminya/setup-cpp](https://redirect.github.com/aminya/setup-cpp) | action | patch | `v1.7.1` -> `v1.7.2` | | ghcr.io/llvm/ci-ubuntu-24.04-abi-tests | container | digest | `01e66b0` -> `f80125c` | | [github/codeql-action](https://redirect.github.com/github/codeql-action) | action | patch | `v4.31.2` -> `v4.31.3` | | llvm/actions | action | digest | `42d8057` -> `5dd9550` |

…ls (llvm#165692) This PR introduces `amdgpu-lower-exec-sync` pass which specifically lowers named-barrier LDS globals introduced by llvm#114550 . Changes include: - Moving the logic of lowering named-barrier LDS globals from `amdgpu-lower-module-lds` pass to this new pass. - This PR adds the pass to pipeline, remove the existing lowering logic for named-barrier LDS in `amdgpu-lower-module-lds` See llvm#161827 for discussion on this topic.

This patch teaches DenseMap constructors to delegate to other DenseMap constructors where we can. The intent is for these constructors to build on top of a higher-level concept like the default-constructed instance instead of calling init on our own. This is part of the effort outlined in llvm#168255.

…min (llvm#167541) Following llvm#112393, this aims to promote vp intrinsics for zvfbfmin without zvfbfa

…68319) The change made in llvm#162433 exposed a weakness in this test that showed different results on different archs that were not caught on the CI bots. This expands the tests to cover more archs, and out of necessity moves the os_log test into a separate test file.

-- This commit is the second in the series of adding matchers for linalg.*conv*/*pool*. Refer: llvm#163724 -- In this commit all variants of Conv1D convolution ops have been added. -- For sake of completion for a specific infra required for those ops which don't require dilations/strides information during their creation, this commit also includes a basic Conv2D and Conv3D op as part of the lit test. Signed-off-by: Abhishek Varma <abhvarma@amd.com>

The pointer needs to point to a record. Fixes llvm#166371

…-specific constant offset. (llvm#165591) In the Dhrystone benchmark, I find some adjacent global not be merged, on the contrary the GCC's anchor optimize is work. Use global-merge-max-offset to set the max offset can yield similar results (still slightly different, at least we can control the offset).

…lvm#168323) The "MachO_ptrauth_noolloc_sections.yaml" testcase had a typo in the name, and didn't explicitly specify its endianness. This was causing it to fail when enabled on big endian platforms. See conversation in llvm#167902

This patch adds support for shared::cta as destination space in the TMA non-tensor copy Op (from global to shared::cta). * Appropriate verifier checks are added. * Unit tests are added to verify the lowering. The related intrinsic changes were merged through PR llvm#167508. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>

…7866) The `DebugObjectManagerPlugin` implements debugger support for ELF platforms with the GDB JIT Interface. It emits a separate debug object allocation in addition to the LinkGraph's own allocation. This used to happen in the plugin's `notifyEmitted()` callback, i.e. after the LinkGraph's allocation was finalized. In the meantime, it had to block finalization of the corresponding materialization unit to make sure that the debugger can register the object before the code runs. This patch switches the plugin to use an allocation action instead. We can remove the `notifyEmitted()` hook and implement all steps as JITLink passes.

- Introduce the -aarch64-force-unroll-threshold option; when a loop’s cost is below this value we set UP.Force = true (default 0 keeps current behaviour) - Add an AArch64 loop-unroll regression test that runs once at the default threshold and once with the flag raised, confirming forced unrolling

…rted with +sve-b16b16 (llvm#167717) The resulting costs are the same as the standard SVE costs for `half` types.

…lvm#168052) Merge the following classes into `SIGfx6CacheControl`: - SIGfx7CacheControl - SIGfx90ACacheControl - SIGfx940CacheControl They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same. This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions. There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review.

…167095) - This patch detects cycles by phis and bails out if one is found. - It prevents to violate DAG restrictions. Abort pipelining in the below case %1 = phi i32 [ %a, %entry ], [ %3, %loop ] %2 = phi i32 [ %a, %entry ], [ %1, %loop ] %3 = phi i32 [ %b, %entry ], [ %2, %loop ] --------- Co-authored-by: Ryotaro Kasuga <kasuga.ryotaro@fujitsu.com>

…m#167683) Resolves llvm#166976

Think it's just a path slash difference. Fixes llvm#167997.

When both `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` and MLIR multithreading are enabled, `topLevelFingerPrint` is empty but its value is accessed. This adds a `has_value()` check before dereferencing the optional.

…165840) Currently LLDB's `ParseRustVariantPart` generates the following `CXXRecordDecl` for a Rust enum ```rust enum AA { A(u8) } ``` ``` CXXRecordDecl 0x5555568d5970 <<invalid sloc>> <invalid sloc> struct AA |-CXXRecordDecl 0x5555568d5ab0 <<invalid sloc>> <invalid sloc> union test_issue::AA$Inner definition | |-CXXRecordDecl 0x5555568d5d18 <<invalid sloc>> <invalid sloc> struct A$Variant definition | | |-DefinitionData pass_in_registers aggregate standard_layout trivially_copyable trivial | | | `-Destructor simple irrelevant trivial needs_implicit | | `-FieldDecl 0x555555a77880 <<invalid sloc>> <invalid sloc> value 'test_issue::AA::A' | `-FieldDecl 0x555555a778f0 <<invalid sloc>> <invalid sloc> $variant$ 'test_issue::AA::test_issue::AA$Inner::A$Variant' |-CXXRecordDecl 0x5555568d5c48 <<invalid sloc>> <invalid sloc> struct A definition | `-FieldDecl 0x555555a777e0 <<invalid sloc>> <invalid sloc> __0 'unsigned char' `-FieldDecl 0x555555a77960 <<invalid sloc>> <invalid sloc> $variants$ 'test_issue::AA::test_issue::AA$Inner' ``` While when the Rust enum type name is the same as its variant name, the generated `CXXRecordDecl` becomes the following – there's a circular reference between `struct A$Variant` and `struct A`, causing llvm#163048. ```rust enum A { A(u8) } ``` ``` CXXRecordDecl 0x5555568d5760 <<invalid sloc>> <invalid sloc> struct A |-CXXRecordDecl 0x5555568d58a0 <<invalid sloc>> <invalid sloc> union test_issue::A$Inner definition | |-CXXRecordDecl 0x5555568d5a38 <<invalid sloc>> <invalid sloc> struct A$Variant definition | | `-FieldDecl 0x5555568d5b70 <<invalid sloc>> <invalid sloc> value 'test_issue::A' <---- bug here | `-FieldDecl 0x5555568d5be0 <<invalid sloc>> <invalid sloc> $variant$ 'test_issue::A::test_issue::A$Inner::A$Variant' `-FieldDecl 0x5555568d5c50 <<invalid sloc>> <invalid sloc> $variants$ 'test_issue::A::test_issue::A$Inner' ``` The problem was caused by `GetUniqueTypeNameAndDeclaration` not returning the correct qualified name for DWARF DIE `test_issue::A::A`, instead, it returned `A`. This caused `ParseStructureLikeDIE` to find the wrong type `test_issue::A` and returned early. The failure in `GetUniqueTypeNameAndDeclaration` appears to stem from a language check that returns early unless the language is C++. I changed it so Rust follows the C++ path rather than returning. I’m not entirely sure this is the right approach — Rust’s qualified name rules look similar, but not identical? Alternatively, we could add a Rust-specific implementation that forms qualified names according to Rust's rules.

…lvm#167223) Fixes llvm#165713 This patch handles out-of-bound vector elements and truncates extra bits.

…lvm#167944) We generate a ADDLV node that incorporates a vecreduce(zext) from elements of half the size. This means that we need the input type to be at least twice the size of the input. I updated some variable names whilst I was here. Fixes llvm#167935

This prevents a machine verifier error, where it "Expected implicit register after groups". Fixes llvm#158661

…#163562) Symbols used for dynamic extent information of memory regions are now kept as live as long as the memory region exists.

For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask: <x1> vector loop: { vector.body: EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next> vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0> EMIT vp<%6> = icmp ule vp<%5>, vp<%3> EMIT vp<%index.next> = add nuw vp<%4>, vp<%1> EMIT branch-on-count vp<%index.next>, vp<%2> No successors } Successor(s): middle.block middle.block: EMIT vp<%8> = last-active-lane vp<%6> EMIT vp<%9> = extract-lane vp<%8>, vp<%5> Successor(s): ir-bb<exit> The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe. Fixes llvm#167813

…m#165426) This is in preparation for a patch that will only fold offsets into flat instructions if their addition is inbounds. Marking the GEPs inbounds here means that their output won't change with the later patch. Basically a retry of the very similar PR llvm#131994, as part of an updated stack of PRs. For SWDEV-516125.

We already ensure that code for different architectures is always placed in different pages in `assignAddresses`. We represent those ranges using their first and last chunks. However, the RVAs of those chunks may not be page-aligned, for example, due to extra padding for entry-thunk offsets. Align the chunk RVAs to the page boundary so that the emitted ranges correctly include the entire region. This change affects an existing test that checks corner cases triggered by merging a data section into a code section. We may now include such data in the code range. This differs from MSVC’s behavior, but it should not cause practical issues, and the new behavior is arguably more correct. Fixes llvm#168119.

…llvm#168339)

…dAssemblyInstEmulation (llvm#168340)

…template arguments (llvm#167341) Although very unusual, the SVal of the argument is not checked for UnknownVal, so we may get a null pointer dereference. In addition, the template arguments of the variant are retrieved incorrectly when type aliases are involved, causing crashes and FPs/FNs.

…lvm#166244) Add a new pinrRecipe which handles printing the recipe without common info like debug info or metadata. Prepares to print them once, in ::print(), after/in combination with llvm#165825. PR: llvm#166244

…rogram (llvm#167758) This fixes the issue reported in llvm#166855 (comment) that had been revealed after llvm#166855 was merged. `CodeGenFunction::GenerateVarArgsThunk` creates thunks for vararg functions by cloning and modifying them. It is different from `CodeGenFunction::generateThunk`, which is used for Itanium ABI. According to https://reviews.llvm.org/D39396, `CodeGenFunction::GenerateVarArgsThunk` may be called before metadata nodes are resolved. So, it tries to avoid remapping DISubprogram and all metadata nodes it references inside `CloneFunction()` by manually cloning DISubprogram. If optimization level is not OptNone, DILocalVariables for a function are saved in DISubprogram's retainedNodes field. When `CodeGenFunction::GenerateVarArgsThunk` clones such DISubprogram without remapping, it produces a subprogram with incorrectly-scoped retained nodes. It triggers Verifier checks added in llvm#166855. To solve that, retained nodes list of a cloned DISubprogram is cleared.

z1-cciauto · 2025-11-17T12:14:46Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2835

kazutakahirata and others added 30 commits November 16, 2025 20:34

[AArch64] Use llvm::any_of (NFC) (llvm#168294)

6152a8b

Identified with llvm-use-ranges.

[ADT] Add roundUpNumBuckets to DenseMap (NFC) (llvm#168301)

4206558

This patch adds computeNumBuckets, a helper function to compute the number of buckets. This is part of the effort outlined in llvm#168255. This makes it easier to move the core logic of grow() to DenseMapBase::grow().

[VP][RISCV] Enable promotion on fixed-length vp intrinsics with zvfbf…

5e4cdd6

…min (llvm#167541) Following llvm#112393, this aims to promote vp intrinsics for zvfbfmin without zvfbfa

[clang][bytecode] Check pointers in GetPtrField{,Pop} (llvm#167335)

90e1391

The pointer needs to point to a record. Fixes llvm#166371

[libclc] Fix link to source in index.html (llvm#167494)

e803390

[orc-rt] Add missing headers to Session.h (llvm#168330)

f00bf4f

[VPlan] Improve code in RemoveMask_match (NFC) (llvm#168065)

daa30ae

[VPlan] Mark getPredicatedMask static (NFC) (llvm#168067)

54fdf67

[CostModel][AArch64] Remove promotion cost for SVE bfloat arith suppo…

26e42c7

…rted with +sve-b16b16 (llvm#167717) The resulting costs are the same as the standard SVE costs for `half` types.

[InlineAsmLowering] unsigned -> TypeSize for getTypeStoreSize result

853ed3b

[clang][Stdlib] Add special mapping for std::compare_three_way

15958f2

[X86][Clang] Add AVX512 kunpck intrinsics to be used in constexp (llv…

44f72fb

…m#167683) Resolves llvm#166976

[lldb][test] Try to fix dwarf64 test on Windows

0dead9e

Think it's just a path slash difference. Fixes llvm#167997.

[MLIR] Fix empty optional access in DialectConversion (llvm#168331)

e992280

When both `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` and MLIR multithreading are enabled, `topLevelFingerPrint` is empty but its value is accessed. This adds a `has_value()` check before dereferencing the optional.

[WebAssembly] Truncate extra bits of large elements in BUILD_VECTOR (l…

63e6373

…lvm#167223) Fixes llvm#165713 This patch handles out-of-bound vector elements and truncates extra bits.

[DAG] Add strictfp implicit def reg after metadata. (llvm#168282)

22968f5

This prevents a machine verifier error, where it "Expected implicit register after groups". Fixes llvm#158661

balazske and others added 10 commits November 17, 2025 11:59

[clang][analyzer] Extend lifetime of dynamic extent information (llvm…

dfac905

…#163562) Symbols used for dynamic extent information of memory regions are now kept as live as long as the memory region exists.

[lldb][nfc] Fix comment about UINT32_MAX in UnwindAssemblyInstruction (…

c2ba81c

…llvm#168339)

[lldb][nfc] Avoid duplicate calls to GetInstructionCondition in Unwin…

74c9168

…dAssemblyInstEmulation (llvm#168340)

merge main into amd-staging

cbdd9d1

z1-cciauto requested a review from nicolasvasilache as a code owner November 17, 2025 12:13

z1-cciauto requested a review from a team November 17, 2025 12:13

ronlieb removed the request for review from nicolasvasilache November 17, 2025 12:14

ronlieb approved these changes Nov 17, 2025

View reviewed changes

z1-cciauto merged commit 8ad44f5 into amd-staging Nov 17, 2025
16 checks passed

z1-cciauto deleted the upstream_merge_202511170713 branch November 17, 2025 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #607

merge main into amd-staging #607

Uh oh!

z1-cciauto commented Nov 17, 2025

Uh oh!

z1-cciauto commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

35 participants

merge main into amd-staging #607

merge main into amd-staging #607

Uh oh!

Conversation

z1-cciauto commented Nov 17, 2025

Uh oh!

z1-cciauto commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

35 participants