Skip to content

Conversation

@z1-cciauto
Copy link
Collaborator

No description provided.

kazutakahirata and others added 30 commits November 16, 2025 20:34
Identified with llvm-use-ranges.
This patch adds computeNumBuckets, a helper function to compute the
number of buckets.

This is part of the effort outlined in llvm#168255.  This makes it easier
to move the core logic of grow() to DenseMapBase::grow().
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [aminya/setup-cpp](https://redirect.github.com/aminya/setup-cpp) |
action | patch | `v1.7.1` -> `v1.7.2` |
| ghcr.io/llvm/ci-ubuntu-24.04-abi-tests | container | digest |
`01e66b0` -> `f80125c` |
|
[github/codeql-action](https://redirect.github.com/github/codeql-action)
| action | patch | `v4.31.2` -> `v4.31.3` |
| llvm/actions | action | digest | `42d8057` -> `5dd9550` |
…ls (llvm#165692)

This PR introduces `amdgpu-lower-exec-sync` pass which specifically
lowers named-barrier LDS globals introduced by llvm#114550 .

Changes include:

- Moving the logic of lowering named-barrier LDS globals from
`amdgpu-lower-module-lds` pass to this new pass.

- This PR adds the pass to pipeline, remove the existing lowering logic for
named-barrier LDS in `amdgpu-lower-module-lds`

See llvm#161827 for discussion on this topic.
This patch teaches DenseMap constructors to delegate to other DenseMap
constructors where we can.

The intent is for these constructors to build on top of a higher-level
concept like the default-constructed instance instead of calling init
on our own.

This is part of the effort outlined in llvm#168255.
…min (llvm#167541)

Following llvm#112393, this aims to promote vp intrinsics for zvfbfmin
without zvfbfa
…68319)

The change made in llvm#162433 exposed a weakness in this test that showed
different results on different archs that were not caught on the CI
bots. This expands the tests to cover more archs, and out of necessity
moves the os_log test into a separate test file.
-- This commit is the second in the series of adding matchers
for linalg.*conv*/*pool*. Refer:
llvm#163724
-- In this commit all variants of Conv1D convolution ops have been
   added.
-- For sake of completion for a specific infra required for those
   ops which don't require dilations/strides information during their
   creation, this commit also includes a basic Conv2D and Conv3D op as
   part of the lit test.

Signed-off-by: Abhishek Varma <abhvarma@amd.com>
…-specific constant offset. (llvm#165591)

In the Dhrystone benchmark, I find some adjacent global not be merged,
on the contrary the GCC's anchor optimize is work. Use
global-merge-max-offset to set the max offset can yield similar results
(still slightly different, at least we can control the offset).
…lvm#168323)

The "MachO_ptrauth_noolloc_sections.yaml" testcase had a typo in the
name, and didn't explicitly specify its endianness. This was causing it
to fail when enabled on big endian platforms.

See conversation in llvm#167902
This patch adds support for shared::cta as destination space in
the TMA non-tensor copy Op (from global to shared::cta).

* Appropriate verifier checks are added.
* Unit tests are added to verify the lowering.

The related intrinsic changes were merged through PR llvm#167508.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
…7866)

The `DebugObjectManagerPlugin` implements debugger support for ELF
platforms with the GDB JIT Interface. It emits a separate debug object
allocation in addition to the LinkGraph's own allocation. This used to
happen in the plugin's `notifyEmitted()` callback, i.e. after the
LinkGraph's allocation was finalized. In the meantime, it had to block
finalization of the corresponding materialization unit to make sure that
the debugger can register the object before the code runs.

This patch switches the plugin to use an allocation action instead. We
can remove the `notifyEmitted()` hook and implement all steps as JITLink
passes.
- Introduce the -aarch64-force-unroll-threshold option; when a loop’s
cost is below this value we set UP.Force = true (default 0 keeps current
behaviour)
- Add an AArch64 loop-unroll regression test that runs once at the
default threshold and once with the flag raised, confirming forced
unrolling
…rted with +sve-b16b16 (llvm#167717)

The resulting costs are the same as the standard SVE costs for `half`
types.
…lvm#168052)

Merge the following classes into `SIGfx6CacheControl`:
- SIGfx7CacheControl
- SIGfx90ACacheControl
- SIGfx940CacheControl

They were all very similar and had a lot of duplicated boilerplate just
to implement one or two codegen differences. GFX90A/GFX940 have a bit
more differences, but they're still manageable under one class because
the general behavior is the same.

This removes 500 lines of code and puts everything into a single place
which I think makes it a lot easier to maintain, at the cost of a slight
increase in complexity for some functions.

There is still a lot of room for improvement but I think this patch is
already big enough as is and I don't want to bundle too much into one
review.
…167095)

- This patch detects cycles by phis and bails out if one is found.
- It prevents to violate DAG restrictions.

Abort pipelining in the below case

%1 = phi i32 [ %a, %entry ], [ %3, %loop ]
%2 = phi i32 [ %a, %entry ], [ %1, %loop ]
%3 = phi i32 [ %b, %entry ], [ %2, %loop ]

---------

Co-authored-by: Ryotaro Kasuga <kasuga.ryotaro@fujitsu.com>
Think it's just a path slash difference.

Fixes llvm#167997.
When both `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` and MLIR
multithreading are enabled, `topLevelFingerPrint` is empty but its value
is accessed. This adds a `has_value()` check before dereferencing the
optional.
…165840)

Currently LLDB's `ParseRustVariantPart` generates the following
`CXXRecordDecl` for a Rust enum
```rust
enum AA {
  A(u8)
}
```

```
CXXRecordDecl 0x5555568d5970 <<invalid sloc>> <invalid sloc> struct AA
|-CXXRecordDecl 0x5555568d5ab0 <<invalid sloc>> <invalid sloc> union test_issue::AA$Inner definition
| |-CXXRecordDecl 0x5555568d5d18 <<invalid sloc>> <invalid sloc> struct A$Variant definition
| | |-DefinitionData pass_in_registers aggregate standard_layout trivially_copyable trivial
| | | `-Destructor simple irrelevant trivial needs_implicit
| | `-FieldDecl 0x555555a77880 <<invalid sloc>> <invalid sloc> value 'test_issue::AA::A'
| `-FieldDecl 0x555555a778f0 <<invalid sloc>> <invalid sloc> $variant$ 'test_issue::AA::test_issue::AA$Inner::A$Variant'
|-CXXRecordDecl 0x5555568d5c48 <<invalid sloc>> <invalid sloc> struct A definition
| `-FieldDecl 0x555555a777e0 <<invalid sloc>> <invalid sloc> __0 'unsigned char'
`-FieldDecl 0x555555a77960 <<invalid sloc>> <invalid sloc> $variants$ 'test_issue::AA::test_issue::AA$Inner'
```

While when the Rust enum type name is the same as its variant name, the
generated `CXXRecordDecl` becomes the following – there's a circular
reference between `struct A$Variant` and `struct A`, causing llvm#163048.

```rust
enum A {
  A(u8)
}
```

```
CXXRecordDecl 0x5555568d5760 <<invalid sloc>> <invalid sloc> struct A
|-CXXRecordDecl 0x5555568d58a0 <<invalid sloc>> <invalid sloc> union test_issue::A$Inner definition
| |-CXXRecordDecl 0x5555568d5a38 <<invalid sloc>> <invalid sloc> struct A$Variant definition
| | `-FieldDecl 0x5555568d5b70 <<invalid sloc>> <invalid sloc> value 'test_issue::A'    <---- bug here
| `-FieldDecl 0x5555568d5be0 <<invalid sloc>> <invalid sloc> $variant$ 'test_issue::A::test_issue::A$Inner::A$Variant'
`-FieldDecl 0x5555568d5c50 <<invalid sloc>> <invalid sloc> $variants$ 'test_issue::A::test_issue::A$Inner'
```

The problem was caused by `GetUniqueTypeNameAndDeclaration` not
returning the correct qualified name for DWARF DIE `test_issue::A::A`,
instead, it returned `A`. This caused `ParseStructureLikeDIE` to find
the wrong type `test_issue::A` and returned early.

The failure in `GetUniqueTypeNameAndDeclaration` appears to stem from a
language check that returns early unless the language is C++. I changed
it so Rust follows the C++ path rather than returning. I’m not entirely
sure this is the right approach — Rust’s qualified name rules look
similar, but not identical? Alternatively, we could add a Rust-specific
implementation that forms qualified names according to Rust's rules.
…lvm#167223)

Fixes llvm#165713
This patch handles out-of-bound vector elements and truncates extra
bits.
…lvm#167944)

We generate a ADDLV node that incorporates a vecreduce(zext) from
elements of half the size. This means that we need the input type to be
at least twice the size of the input.

I updated some variable names whilst I was here.

Fixes llvm#167935
This prevents a machine verifier error, where it "Expected implicit
register after groups".

Fixes llvm#158661
balazske and others added 10 commits November 17, 2025 11:59
…#163562)

Symbols used for dynamic extent information of memory regions are now
kept as live as long as the memory region exists.
For a scalar only VPlan with tail folding, if it has a phi live out then
legalizeAndOptimizeInductions will scalarize the widened canonical IV
feeding into the header mask:

    <x1> vector loop: {
      vector.body:
        EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
        vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0>
        EMIT vp<%6> = icmp ule vp<%5>, vp<%3>
        EMIT vp<%index.next> = add nuw vp<%4>, vp<%1>
        EMIT branch-on-count vp<%index.next>, vp<%2>
      No successors
    }
    Successor(s): middle.block

    middle.block:
      EMIT vp<%8> = last-active-lane vp<%6>
      EMIT vp<%9> = extract-lane vp<%8>, vp<%5>
    Successor(s): ir-bb<exit>

The verifier complains about this but this should still generate the
correct last active lane, so this fixes the assert by handling this case
in isHeaderMask. There is a similar pattern already there for
ActiveLaneMask, which also expects a VPScalarIVSteps recipe.

Fixes llvm#167813
…m#165426)

This is in preparation for a patch that will only fold offsets into flat
instructions if their addition is inbounds. Marking the GEPs inbounds here
means that their output won't change with the later patch.

Basically a retry of the very similar PR llvm#131994, as part of an updated stack
of PRs.

For SWDEV-516125.
We already ensure that code for different architectures is always placed
in different pages in `assignAddresses`. We represent those ranges using
their first and last chunks. However, the RVAs of those chunks may not
be page-aligned, for example, due to extra padding for entry-thunk
offsets. Align the chunk RVAs to the page boundary so that the emitted
ranges correctly include the entire region.

This change affects an existing test that checks corner cases triggered
by merging a data section into a code section. We may now include such
data in the code range. This differs from MSVC’s behavior, but it should
not cause practical issues, and the new behavior is arguably more
correct.

Fixes llvm#168119.
…template arguments (llvm#167341)

Although very unusual, the SVal of the argument is not checked for
UnknownVal, so we may get a null pointer dereference.

In addition, the template arguments of the variant are retrieved
incorrectly when type aliases are involved, causing crashes and FPs/FNs.
…lvm#166244)

Add a new pinrRecipe which handles printing the recipe without common
info like debug info or metadata.

Prepares to print them once, in ::print(), after/in combination with
llvm#165825.

PR: llvm#166244
…rogram (llvm#167758)

This fixes the issue reported in
llvm#166855 (comment)
that had been revealed after
llvm#166855 was merged.

`CodeGenFunction::GenerateVarArgsThunk` creates thunks for vararg
functions by cloning and modifying them. It is different from
`CodeGenFunction::generateThunk`, which is used for Itanium ABI.

According to https://reviews.llvm.org/D39396,
`CodeGenFunction::GenerateVarArgsThunk` may be called before metadata
nodes are resolved. So, it tries to avoid remapping DISubprogram and all
metadata nodes it references inside `CloneFunction()` by manually
cloning DISubprogram.

If optimization level is not OptNone, DILocalVariables for a function
are saved in DISubprogram's retainedNodes field. When
`CodeGenFunction::GenerateVarArgsThunk` clones such DISubprogram without
remapping, it produces a subprogram with incorrectly-scoped retained
nodes. It triggers Verifier checks added in
llvm#166855.

To solve that, retained nodes list of a cloned DISubprogram is cleared.
@z1-cciauto z1-cciauto requested a review from a team November 17, 2025 12:13
@ronlieb ronlieb removed the request for review from nicolasvasilache November 17, 2025 12:14
@z1-cciauto
Copy link
Collaborator Author

@z1-cciauto z1-cciauto merged commit 8ad44f5 into amd-staging Nov 17, 2025
16 checks passed
@z1-cciauto z1-cciauto deleted the upstream_merge_202511170713 branch November 17, 2025 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.