merge main into amd-staging #639

ronlieb · 2025-11-21T00:24:05Z

No description provided.

Avoids regression which caused the revert 6d5f87f. This is a hack on a hack. We currently have isUniformMMO, which improperly treats unknown source value as known uniform. This is hack from before we had divergence information in the DAG, and should be removed. This is the minimum change to avoid the regression; removing the aggressive handling of the unknown case (or dropping isUniformMMO entirely) are more involved fixes.

Upstream ExtVectorElementExpr with rvalue base

…168447) We can't do anything meaningful to such functions: they aren't optimizable, and even if inlined, they would bring no code open to optimization.

…vm#166839) Resolves llvm#165694

To make life easier for future contributors. Note that formatting changes are due to git clang-format on the touched whitespace-error lines.

test/Lower/select-case-statement.f90 was still using the old lowering. Modified the test with FIR generated using the new lowering. Changed the test to use flang_fc1 instead of bbc and added testing for -O0 and -O1, since character comparison lowering is done differently at -O0 (uses runtime function) and -O1 (inlines some cases). Use different FileCheck prefixes for different optimization levels (CHECK-O0 for -O0, CHECK-O1 for -O1, CHECK for both).

…m#168292) (llvm#168786) This reverts commit 6d5f87f. Previously this failed due to treating the unknown MachineMemOperand value as known uniform.

…lvm#165416) This PR introduces new debug macros that allow a more fined control of which debug message to output and introduce C++ stream style for debug messages. Changing existing messages (except a few that I changed for testing) will come in subsequent PRs. I also think that we should make debug enabling OpenMP agnostic but, for now, I prioritized maintaing the current libomptarget behavior for now, and we might need more changes further down the line as we we decouple libomptarget.

) If `HardwareBreakpointTestBase.supports_hw_breakpoints()` returns False, `SimpleHWBreakpointTest.does_not_support_hw_breakpoints()` returns None, so the test runs and fails. However, it should be skipped instead. The test was added in llvm#146602, while `supports_hw_breakpoints()` was changed in llvm#146609, which was landed earlier despite having a bigger number.

…econstructing DIE names (llvm#168734) Depends on: * llvm#168725 When compiling with `-glldb`, we repoint the `DW_AT_type` of a DIE to be a typedef that refers to the `preferred_name`. I.e.,: ``` template <typename T> structure t7; using t7i = t7<int>; template <typename T> struct __attribute__((__preferred_name__(t7i))) t7 {}; template <typename... Ts> void f1() int main() { f1<t7i>(); } ``` would produce following (minified) DWARF: ``` DW_TAG_subprogram DW_AT_name ("_STN|f1|<t7<int> >") DW_TAG_template_type_parameter DW_AT_type (0x0000299c "t7i") ... DW_TAG_typedef DW_AT_type (0x000029a7 "t7<int>") DW_AT_name ("t7i") ``` Note how the `DW_AT_type` of the template parameter is a typedef itself (instead of the canonical type). The `DWARFTypePrinter` would take the `DW_AT_name` of this typedef when reconstructing the name of `f1`, so we would end up with a verifier failure: ``` error: Simplified template DW_AT_name could not be reconstituted: original: f1<t7<int> > reconstituted: f1<t7i> ``` Fixing this allows us to un-XFAIL the `simplified-template-names.cpp` test in `cross-project-tests`. Unfortunately this is only tested on Darwin, where LLDB tuning is the default. AFAIK, there is no other case where the template parameter type wouldn't be canonical.

Currently the tests for LLVM targets `AArch64` and `ARM` were in the same directory. But if you only configured LLVM for one target (e.g., just `AArch64`, which is how I ran into this), then all tests under the ARM directory are marked `UNSUPPORTED`. This patch moves all the tests that are capable of running on `AArch64`-only targets into a dedicated `AArch64` directory. The tests that expected a plain `ARM` target were kept in the `ARM` directory. Drive-by: * Rename the `dummy-debug-map-amr64.map` to `dummy-debug-map-arm64.map` (note the typo in `amr64`)

llvm#168619) I've been working on some scripts that evaluate the parent and child frame. It's been very annoying that the parent frame has a property but not the child. So I've added this to the extensions, I would've preferred to return None, but because the existing impl returns an invalid SBFrame, so I'm conforming to that API. ``` (lldb) script Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D. >>> lldb.frame frame #0: 0x0000555555555200 fib.out`main >>> lldb.frame.parent frame #1: 0x00007ffff782a610 libc.so.6`__libc_start_call_main + 128 >>> lldb.frame.parent.child frame #0: 0x0000555555555200 fib.out`main ```

…ar (llvm#168787)

When downloading bazelisk/buildifier, we use curl, which still returns exit code zero on HTTP 4xx errors unless we pass --fail. This patch adds --fail flags so that error messages are more clear.

…lvm#168918) We already know we're looking at BITREVERSE, we can match on the source operand.

There are several places where we use `llvm::OwningArrayRef`. The interface to this requires us to first construct temporary storage, then allocate space and set the allocated memory to 0, then copy the values we actually want into that memory, then move the array into place. Instead we can just do it all inline in a single pass by using `std::vector`. In one case we actually allocate a completely separate container and then allocate + copy the data over because `llvm::OwningArrayRef` does not (and can't) support `push_back`. Note that `llvm::SmallVector` is not a suitable replacement here because we rely on reference stability on move construction: when the outer container reallocates, we need the the contents of the inner containers to be fixed in memory, and `llvm::SmallVector` does not give us that guarantee.

…pInterface.cpp (NFC)

…vm#162952) Pyright is an MIT-licensed static type checker and can be found at https://github.com/microsoft/pyright there are also various integrations to use it as an LSP server in various editors which is the main way I use it. It's useful on our python scripts to detect issues such as where functions are called with unexpected types or it's possible to access obj.attr on an object that doesn't have that attribute. It can be used without any configuration this config setting causes it to also report issues with type hints that do not meet our python 3.8 minimum such as this one from dap_server.py: ``` init_commands: list[str], ``` subscripting the builtin type like that requires python 3.9 while the 3.8 equivalent is: ``` from typing import List ... init_commands: List[str], ``` In practice these scripts still work on 3.8 because type hints aren't normally evaluated during normal execution but since we have a minimum, we should fully comply with it. Note: The error pyright reports for this particular issue isn't great: ``` error: Subscript for class "list" will generate runtime exception; enclose type expression in quotes ``` This is technically correct as it is possible to evaluate type hints at runtime but I believe anything that would do so would also evaluate the string form as well and still hit the runtime exception. A better suggestion in this case would have been the 3.8 compatible `List[str]`. However, it is better than silently passing code that doesn't confirm to the minimum.

…lvm#168795) This test explicitly sets the environment for a spawned process. Without DYLD_LIBRARY_PATH, the spawned process may use a ASAN runtime other than the one that was used by the parent process That other runtime library may not work at all, or may not be in the default search path. Either case can cause the spawned process to die before it makes it to main, thus failing the test. The compiler-rt lit config sets the library path variable [here](https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/lit.common.cfg.py#L84) (i.e. to ensure that just-built runtimes are used for tests, in the case of a standalone compiler-rt build), and that is currently used by the parent process but not the spawned ones. My change only forwards the variable for Darwin (DYLD_LIBRARY_PATH), but we **_ought_** to also forward the variable for other platforms. However, it's not clear that there's any good way to plumb this into the test, since some platforms actually have multiple library path variables which would need to be forwarded (see: SunOS [here](https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/lit.common.cfg.py#L102)). I considered adding a substitution variable for the library path variable, but that doesn't really work if there's multiple such variables.

This reverts commit b725bdb. This is still causing Darwin failures. There are six tests that are still failing: AddressSanitizer-x86_64-darwin.TestCases/Posix.deep_call_stack.cpp AddressSanitizer-x86_64-darwin.TestCases.scariness_score_test.cpp AddressSanitizer-x86_64h-darwin.TestCases/Posix.deep_call_stack.cpp ORC-x86_64-darwin.TestCases/Darwin/x86-64.objc-imageinfo.S UBSan-Minimal-x86_64-darwin.TestCases.test-darwin-interface.c UBSan-Minimal-x86_64h-darwin.TestCases.test-darwin-interface.c There are a couple failure modes: 1. deep_call_stack.cpp and scariness_score_test.cpp are failing due to ulimit issues that we have observed previously. 2. objc-imageinfo.S is failing in the x86 variant because I only updated the AArch64 variant. 3. test-darwin-interface.c is using subshells, so obviously fails with the internal shell. Also looks like this one did not run on my system due to it requiring x86_64 Darwin.

Add extra tests for over-eager tail-folding for tiny trip-count loops. Reduced from llvm#167858.

This has been replaced by the MODULE.bazel file. Users can still use their own WORKSPACE files, but they didn't inherit this file anyways. Users should migrate to bzlmod as with bazel 9.x that is required.

Need to check if the non-schedulable phi parent node has unique operands, if the incoming node has copyables, and the node is commutative. Otherwise, there might be issues with the correct calculation of the dependencies. Fixes llvm#168589

…on iOS/Android (llvm#168821) The tests added by llvm#163468 appear to be broken due to lack of libcxx support (?). Marking unsupported everywhere for now since it passes on some platforms and fails on others, and I don't know the full list. Android fail: https://lab.llvm.org/buildbot/#/builders/186/builds/14106

…llvm#168289) Remove `VPWidenPointerInductionRecipe::IsScalarAfterVectorization` and replace it with `onlyScalarValuesUsed`. This removes the need to carry state from the legacy cost model through VPlan, and the VPlan-based analysis gives more accurate results, avoiding a number of extracts. PR: llvm#168289

Removes unused headers or replaces them with headers that directly provide the symbol instead. For example, `Serialize.h` included `AST.h`, but it was actually `Serialize.cpp` that needed concept expressions, so now it includes just `ExprConcepts.h`.

These flags are not needed for building libc.

Mark ELF_ppc64_relocations.s as unsupported on SystemZ because of cross build issue related to using dlsym for host symbols. Test fails to resolve __tls_get_aadr on SystemZ host. Co-authored-by: anoopkg6 <anoopkg6@github.com>

…lvm#168937) This patch extracts the common logic for computing array element counts from shape operands into a reusable helper function in CUFCommon.

These horizontal add/sub instructions are currently handled by adding/subtracting tuples of the first operand, followed by tuples of the second operand. This is not the correct semantics for the 256-bit insructions: they process the first half of the first operand, then the first half of the second operand, then the second half of the first operand, and finally the second half of the second operand (trust me bro [*]). This patch fixes the issue by applying the "shards" functionality that was added in llvm#167954, to handle the top and bottom 128-bit "shards" in turn. [*] clang/test/CodeGen/X86/avx2-builtins.c: ``` TEST_CONSTEXPR(match_v8si(_mm256_hadd_epi32( (__m256i)(__v8si){10, 20, 30, 40, 50, 60, 70, 80}, (__m256i)(__v8si){5, 15, 25, 35, 45, 55, 65, 75}), 30,70,20,60,110,150,100,140)); ```

Add a low trip count test that is currently vectorized but unprofitable, for llvm#167858.

…lvm#168609) which reassigns scale operand in vgpr_32 register to agpr_32, not permitted by instruction format. Reduced from ck. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> Co-authored-by: theRonShark <ron.lieberman@amd.com>

Don't specifically target windows-msvc - the same goes for any windows target; mingw doesn't have dlfcn.h either.

…#168900) When comparing additions with the same base where one has `nsw`, the following simplification can be performed: ```llvm icmp slt/sgt/sle/sge (x + C1), (x +nsw C2) => icmp slt/sgt/sle/sge C1, C2 ``` Previously this was only done for `slt`. This patch extends it to the `sgt`, `sle`, and `sge` predicates when either of the conditions hold: - `C1 <= C2 && C1 >= 0`, or - `C2 <= C1 && C1 <= 0` This patch also handles the `C1 == C2` case, which was previously excluded. Proof: https://alive2.llvm.org/ce/z/LtmY4f

Remove all constraint propagation functions in Dependence Analysis.

Add dependency on headers with `in_addr` and `in_addr_t` type definitions to ensure that these headers will be properly installed by "install-libc" CMake target.

… a given tiled loop nest. (llvm#167634) The existing `scf::tileAndFuseConsumerOfSlices` takes a list of slices (and loops they are part of), tries to find the consumer of these slices (all slices are expected to be the same consumer), and then tiles the consumer into the loop nest using the `TilingInterface`. A more natural way of doing consumer fusion is to just start from the consumer, look for operands that are produced by the loop nest passed in as `loops` (presumably these loops are generated by tiling, but that is not a requirement for consumer fusion). Using the consumer you can find the slices of the operands that are accessed within the loop which you can then use to tile and fuse the consumer (using `TilingInterface`). This handles more naturally the case where multiple operands of the consumer come from the loop nest. The `scf::tileAndFuseConsumerOfSlices` was implemented as a mirror of `scf::tileAndFuseProducerOfSlice`. For the latter, the slice has a single producer for the source of the slice, which makes it a natural way of specifying producer fusion. But for consumers, the result might have multiple users, resulting in multiple candidates for fusion, as well as a fusion candidate using multiple results from the tiled loop nest. This means using slices (`tensor.insert_slice`/`tensor.parallel_insert_slice`) as a hook for consumer fusion turns out to be quite hard to navigate. The use of the consumer directly avoids all those pain points. In time the `scf::tileAndFuseConsumerOfSlices` should be deprecated in favor of `scf::tileAndFuseConsumer`. There is a lot of tech-debt that has accumulated in `scf::tileAndFuseConsumerOfSlices` that needs to be cleanedup. So while that gets cleaned up, and required functionality is moved to `scf::tileAndFuseConsumer`, the old path is still maintained. The test for `scf::tileAndFuseConsumerUsingSlices` is copied to `tile-and-fuse-consumer.mlir` to `tile-and-fuse-consumer-using-slices.mlir`. All the tests that were there in this file are now using the `tileAndFuseConsumer` method. The test op `test.tile_and_fuse_consumer` is modified to call `scf::tileAndFuseConsumer`, while a new op `test.tile_and_fuse_consumer_of_slice` is used to keep the old path tested while it is deprecated. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>

Add declaration of command line options to BugDriver.h and remove extern declarations in individual .cpp files.

Reverts llvm#168921 Causes build failures.

llvm#156577) Add detailed comments explaining each function's memory access patterns and why they should/shouldn't be unroll-and-jammed: - fore_aft_*: Dependencies between fore block and aft block - fore_sub_*: Dependencies between fore block and sub block - sub_aft_*: Dependencies between sub block and aft block - sub_sub_*: Dependencies within sub block - *_less: Backward dependency (i-1) - safe for fore/aft, fore/sub, sub/aft; unsafe for sub/sub due to jamming conflicts - *_eq: Same iteration dependency (i+0) - safe due to preserved execution order - *_more: Forward dependency (i+1) - unsafe due to write-after-write races between unrolled iterations, except sub/sub case creates conflicts

…168962) The only thing the docs should depend on is on the SWIG wrapper (lldb.py) which only requires parsing the API headers. It should not depend on building libLLDB. The dependency was (I believe accidentally) introduced by 59f4267. Fixes llvm#123316

…vm#168930) Removes about 200 bytes of unneeded patterns from RISCVGenDAGISel.inc

… Wg To Sg (llvm#168118)

…llvm#168932) This removes an unnecessary isel pattern for the RV32 HwMode.

…llvm#168957) On startup, bazel prints: `WARNING: Option 'experimental_guard_against_concurrent_changes' is deprecated: Use --guard_against_concurrent_changes instead`

…lvm#168973) Reverts llvm#168643

z1-cciauto · 2025-11-21T00:25:29Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2910

arsenm and others added 30 commits November 20, 2025 11:10

[CIR] ExtVectorElementExpr with rvalue base (llvm#168260)

5b8656c

Upstream ExtVectorElementExpr with rvalue base

[profcheck] Exclude naked, asm-only functions from profcheck (llvm#…

b9d9811

…168447) We can't do anything meaningful to such functions: they aren't optimizable, and even if inlined, they would bring no code open to optimization.

[X86] Lower mathlib call ldexp into scalef when avx512 is enabled (ll…

6c79cc7

…vm#166839) Resolves llvm#165694

[AMDGPU] Precommit tests for V_CVT_PK_[IU]16_F32 (llvm#168893)

6ce4794

[SDAG] Fix whitespace errors (NFC) (llvm#168897)

602fa0c

To make life easier for future contributors. Note that formatting changes are due to git clang-format on the touched whitespace-error lines.

Reapply "DAG: Allow select ptr combine for non-0 address spaces" (llv…

0e1cb2d

…m#168292) (llvm#168786) This reverts commit 6d5f87f. Previously this failed due to treating the unknown MachineMemOperand value as known uniform.

[gn] port c9f5734 (TargetLibraryInfo.inc)

4aee501

[bazel][LoongArch] Port llvm#168129: tablegen for sdnode (llvm#168907)

a070240

AMDGPU: Handle invariant loads when considering if a load can be scal…

e79c7c1

…ar (llvm#168787)

[Github] Error on HTTP 4xx Errors (llvm#168919)

6d52efc

When downloading bazelisk/buildifier, we use curl, which still returns exit code zero on HTTP 4xx errors unless we pass --fail. This patch adds --fail flags so that error messages are more clear.

[DAGCombiner] Remove unneeded m_BitReverse from visitBITREVERSE. NFC (l…

01e5e4f

…lvm#168918) We already know we're looking at BITREVERSE, we can match on the source operand.

[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in ValueBoundsO…

4100845

…pInterface.cpp (NFC)

[LV] Add tests for loops with low trip counts requiring tail-folding.

827ff2c

Add extra tests for over-eager tail-folding for tiny trip-count loops. Reduced from llvm#167858.

[bazel] Delete WORKSPACE file (llvm#168926)

777935c

This has been replaced by the MODULE.bazel file. Users can still use their own WORKSPACE files, but they didn't inherit this file anyways. Users should migrate to bzlmod as with bazel 9.x that is required.

[Support] Add vector::erase to JSON::Array (llvm#168835)

155a7d8

petrhosek and others added 24 commits November 20, 2025 20:44

[libc] Removed unused flags from baremetal cache files (llvm#168942)

91e777f

These flags are not needed for building libc.

[flang][cuda] Extract element count computation into helper function (l…

1b8a4aa

…lvm#168937) This patch extracts the common logic for computing array element counts from shape operands into a reusable helper function in CUFCommon.

[LV] Add test a low-trip count test without folding the tail.

a3f6c43

Add a low trip count test that is currently vectorized but unprofitable, for llvm#167858.

[compiler-rt] [test] Generalize an UNSUPPORTED marking (llvm#168858)

04acac2

Don't specifically target windows-msvc - the same goes for any windows target; mingw doesn't have dlfcn.h either.

[DA] remove constraint propagation (llvm#160924)

5c8db7a

Remove all constraint propagation functions in Dependence Analysis.

[libc] Add missing dependencies for arpa/inet.h header. (llvm#168951)

1136239

Add dependency on headers with `in_addr` and `in_addr_t` type definitions to ensure that these headers will be properly installed by "install-libc" CMake target.

[mlir] Add kuhar to code owners for arith (llvm#168945)

9e2ca0d

[NFC][bugpoint] Namespace cleanup in bugpoint (llvm#168921)

bf91a62

Add declaration of command line options to BugDriver.h and remove extern declarations in individual .cpp files.

Revert "[NFC][bugpoint] Namespace cleanup in bugpoint" (llvm#168961)

b83e458

Reverts llvm#168921 Causes build failures.

[RISCV] Only add v2i32 to GPR regclass in the RV64 hardware mode. (ll…

fbc0935

…vm#168930) Removes about 200 bytes of unneeded patterns from RISCVGenDAGISel.inc

[MLIR] [XeGPU] Add distribution pattern for vector.constant_mask from…

310abe0

… Wg To Sg (llvm#168118)

[RISCV] Use SDT_RISCVIntUnaryOpW for RISCVISD::ABSW type profile. NFC (…

a9435cb

…llvm#168932) This removes an unnecessary isel pattern for the RV32 HwMode.

[clang][deps] NFC: Fix typo in function name (llvm#168958)

925ce5a

[bazel] Replace --experimental_guard_against_concurrent_changes usage (…

3723a8b

…llvm#168957) On startup, bazel prints: `WARNING: Option 'experimental_guard_against_concurrent_changes' is deprecated: Use --guard_against_concurrent_changes instead`

[UBSan] [compiler-rt] add preservecc variants of handlers (llvm#168643)

49e46a5

Revert "[UBSan] [compiler-rt] add preservecc variants of handlers" (l…

418204d

…lvm#168973) Reverts llvm#168643

merge main into amd-staging

e330aa1

ronlieb requested review from a team and dpalermo November 21, 2025 00:24

ronlieb requested a review from nicolasvasilache as a code owner November 21, 2025 00:24

dpalermo approved these changes Nov 21, 2025

View reviewed changes

ronlieb closed this Nov 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #639

merge main into amd-staging #639

Uh oh!

ronlieb commented Nov 21, 2025

Uh oh!

z1-cciauto commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

44 participants

merge main into amd-staging #639

merge main into amd-staging #639

Uh oh!

Conversation

ronlieb commented Nov 21, 2025

Uh oh!

z1-cciauto commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

44 participants