merge main into amd-staging #675

ronlieb · 2025-11-25T03:33:59Z

No description provided.

If the expected trip count is less than the VF, the vector loop will only execute a single iteration. When that's the case, the cost of the middle block has the same impact as the cost of the vector loop. Include it in isOutsideLoopWorkProfitable to avoid vectorizing when the extra work in the middle block makes it unprofitable. Note that isOutsideLoopWorkProfitable already scales the cost of blocks outside the vector region, but the patch restricts accounting for the middle block to cases where VF <= ExpectedTC, to initially catch some worst cases and avoid regressions. This initial version should specifically avoid unprofitable tail-folding for loops with low trip counts after re-applying llvm#149042. PR: llvm#168949

Currently, in the following snippet, the second designated initializer is incorrectly detected as an OBJC method expr. Fix that and a test to make sure we don't regress. ``` Foo foo[] = {[0] = 1, [1] = 2}; ```

…Frontend and flangFrontend (llvm#165277)" This reverts commit 3773bbe.

Adds support for the `__builtin_ia32_kshiftli` and `__builtin_ia32_kshiftri` X86 builtins. Part of llvm#167765 --------- Signed-off-by: vishruth-thimmaiah <vishruththimmaiah@gmail.com>

…gFrontend and flangFrontend (llvm#165277)" This reverts commit 40334b8. Unfortunately the revert breaks the build.

…Frontend and flangFrontend (llvm#165277)" (llvm#169397) This reverts commit 3773bbe and relands the last revert attempt 40334b8. 3773bbe broke the build for the build configuration described in here: llvm#165277 (comment)

) This PR introduces a new additional type of map lowering for record types that Clang currently supports, in which a user can map a top-level record type and then individual members with different mapping, effectively creating a sort of "overlapping" mapping that we attempt to cut around. This is currently most predominantly used in Fortran, when mapping descriptors and there data, we map the descriptor and its data with separate map modifiers and "cut around" the pointer data, so that wedo not overwrite it unless the runtime deems it a neccesary action based on its reference counting mechanism. However, it is a mechanism that will come in handy/trigger when a user explitily maps a record type (derived type or structure) and then explicitly maps a member with a different map type. These additions were predominantly in the OpenMPToLLVMIRTranslation.cpp file and phase, however, one Flang test that checks end-to-end IR compilation (as far as we care for now at least) was altered. 2/3 required PRs to enable declare target to mapping, should look at PR 3/3 to check for full green passes (this one will fail a number due to some dependencies). Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com

…ntation (llvm#119589) While the infrastructure for declare target to/enter and link for variables exists in the MLIR dialect and at the Flang level, the current lowering from MLIR -> LLVM IR isn't in place, it's only in place for variables that have the link clause applied. This PR aims to extend that lowering to an initial implementation that incorporates declare target to as well, which primarily requires changes in the OpenMPToLLVMIRTranslation phase. However, a minor addition to the OpenMP dialect was required to extend the declare target enumerator to include a default None field as well. This also requires a minor change to the Flang lowering's MapInfoFinlization.cpp pass to alter the map type for descriptors to deal with cases where a variable is marked declare to. Currently, when a descriptor variable is mapped declare target to the descriptor component can become attatched, and cannot be updated, this results in issues when an unusual allocation range is specified (effectively an off-by X error). The current solution is to map the descriptor always, as we always require an up-to-date version of this data. However, this also requires an interlinked PR that adds a more intricate type of mapping of structures/record types that clang currently implements, to circumvent the overwriting of the pointer in the descriptor. 3/3 required PRs to enable declare target to mapping, this PR should pass all tests and provide an all green CI. Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com

…lvm#168973) (llvm#169091) This reverts commit 418204d.

Adds an explicit include of `<cassert>` in StringTable.h rather than relying on the one in StringRef.h. Fixes potential compile errors if assert() was undef'ed between StringRef.h and StringTable.h inclusion.

This verifier check will complain if there aren't enough implicit operands -- so it doesn't *allow* those operands, it *requires* them.

…8417) This patch add support for lowering of custom reductions to MLIR. It also enhances the capability of the pass to automatically mark functions as "declare target" by traversing custom reduction initializers and combiners.

…lvm#169235) This fixes test errors like this, at least for a mingw target, if building with Clang 21 instead of Clang 20, as in the CI environment: # .---command stderr------------ # | error: 'expected-error' diagnostics seen but not expected: # | File C:\a\llvm-mingw\llvm-mingw\llvm-project\libcxx\test\std\input.output\file.streams\c.files\gets-removed.verify.cpp Line 16: cannot initialize a parameter of type 'char *' with an lvalue of type 'const char *' # | 1 error generated. # `----------------------------- # error: command failed with exit status: 1 This extra, unexpected diagnostic appears in Clang 21, since commit 9eef4d1 ("Remove delayed typo expressions"). Before this, we got the expected diagnostic `error: no member named 'gets' in namespace 'std'`, with the typo correction hint `did you mean 'puts'?`. After this change, we get the typo correction hint `did you mean simply 'gets'?` instead. And with the typo correction finding `::gets`, it goes on to produce a second diagnostic about mismatched parameter for that function. Avoid these unexpected diagnostics by passing the right type of parameter to the gets function.

…8468) This commit adds a `MockDwarfDelegate` class that can be used to control what dwarf version is used when evaluating an expression. We also add a simple test that shows how dwarf version can change the result of the expression.

This header is only ever used inside `src/`, so we might as well move it there. As a drive-by this also removes some dead code.

…ualifiers of type locs. (llvm#167619) Previously, e.g. for TypeLoc "MyNamespace::MyClass", `node()` selects only "MyClass" without the qualifier. With this change, it now selects "MyNamespace::MyClass". --------- Co-authored-by: Florian Mayer <fmayer@google.com>

…lvm#168857) This avoids dozens of instances of benign error messages being printed when running the tests on e.g. Windows: Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'os' has no attribute 'sysconf' Co-authored-by: Florian Mayer <fmayer@google.com>

Only some fortran source files in flang/test/Lower have been modified. The other files in the directory will be cleaned up in subsequent commits

…vm#169327) OffsetSizeAndStrideOpInterface does not specify whether it's operating on the input or output shape and in fact different ops implement this in different ways, which is also why SubviewOp is special cased here. This "marked as dynamic but not really dynamic" folding is better handled by shape inference, so just remove the bad fold.

SVE depends on a combination of host support and operating system support. Sometimes those don't line up with detected host CPU name; make sure SVE is disabled when it isn't available. Implement this for both Windows and Linux. (We don't have a codepath for other operating systems. If someone wants to implement this, it should be possible to adapt fmv code from compiler-rt.) While I'm here, also add support for detecting other Windows CPU features. For Windows, declare constants ourselves so the code builds on older SDKs; we also do this in compiler-rt.

This is currently a no op. This will be supported for the minimal runtime in a follow up. This allows to improve codegen for fsanitize-recover by compiling the handlers with [[clang::preserve_all]]. This makes sure that the caller does not need to spill any registers. We do not expect this function to be called frequently, so this is beneficial for code size.

) Implement `this_cluster` like `this_group` by lowering it directly like an intrinsic function. Use the NVVM operation to get the rank and size information and populate the derived type.

Addresses feedback from llvm#147508 (review) : - Update access modifiers for SYCLWrapper members. - Update comments. - Update types.

…#169346) PR168884 flagged compiler directives (!dir$ ...) inside OpenMP loop constructs as errors. This caused some customer applications to fail to compile (issue 169229). Downgrade the error to a warning, and gracefully ignore compiler directives when lowering loop constructs to MLIR. Fixes llvm#169229

In the LoweringPrepare pass, the handling for global array destructor lowering was mishandling the insertion point, so that if this code needed to create a declaration for the __cxa_atexit function, that declaration was being created in the dtor region, rather than at module scope. This change fixes that.

AMDGPU requires more complex CFI rules, normally these would be expressed with .cfi_escape, however this would make the CFI unreadable and makes it difficult to update registers in CFI instructions (also something AMDGPU requires). Authored-by: Emma Pilkington <Emma.Pilkington@amd.com>

llvm#169417)

This fixes llvm#166172.

This is mostly the output of a vibe coded script running on VecFuncs.def, with a lot of manual cleanups and fixing where the vibes were off. This is not yet wired up to anything (except for the handful of calls which are already manually enabled). In the future the SystemLibrary mechanism needs to be generalized to allow plugging these sets in based on the flag. One annoying piece is there are some name conflicts across the libraries. Some of the libmvec functions have name collisions with some sleef functions. I solved this by just adding a prefix to the libmvec functions. It would probably be a good idea to add a prefix to every group. It gets ugly, particularly since some of the sleef functions started to use a Sleef_ prefix, but mostly do not.

… supported (llvm#169252) Fix kernel build when cl_khr_fp64 is not enabled: opencl-c.h:13785:50: error: unknown type name 'atomic_double' 13785 | double __ovld atomic_fetch_min(volatile __global atomic_double *, double); opencl-c.h:13785:67: error: use of type 'double' requires cl_khr_fp64 and __opencl_c_fp64 support 13785 | double __ovld atomic_fetch_min(volatile __global atomic_double *, double); This is a regression introduced by 423bdb2. Before that commit, __opencl_c_ext_fp64_global_atomic_add was guarded by cl_khr_fp64 in opencl-c-base.h.

…set (llvm#168329)

Previously, i16 `bswap` was lowered using multiple shift and OR operations. This patch adds a pattern to directly lower i16 `bswap` using the `PRMT` (permute) instruction, which is more efficient. Additionally, the lowering of `bswap` is moved into operation legalization, which allows for DAGCombiner to optimize the lowered code.

This way if the downstream consuming project uses zstd we make sure they are dedup'd. This uses a new rule to make sure layering_check still works while allowing us to augment the upstream library rules with LLVM specific `defines`.

…ps (llvm#169427)

Clarify how Clang-generated HIP fat binaries are registered and unregistered with the HIP runtime, and how this interacts with global constructors, destructors, and atexit handlers. Document that there is no strong guarantee on ordering relative to user-defined global ctors/dtors, recommend that HIP application developers avoid using kernels or device variables from global ctors/dtors, and describe the implications for HIP runtime developers (synchronization and guards in __hipRegisterFatBinary/__hipUnregisterFatBinary). This is motivated by questions from HIP application and runtime developers about fat binary registration/unregistration order and its potential interference with their own initialization and teardown code.

llvm#169428) As per OpenMP 5.1, the pointers are expected to retain their original values when a lookup fails and there is no device pointer to translate to.

…lvm#169275) Add support for `arith.extf` and `arith.truncf`. No support for custom rounding modes yet.

This adds a pyproject.toml file for packaging the clang Python bindings as a sdist tarball and pure Python wheel packages for the clang python bindings. It is required to move updates of the clang and libclang PyPI packages to the LLVM monorepo. Versioning information is derived from LLVM git tags (using hatch-vcs, which is based on setuptools_scm), so no manual updates are needed to bump version numbers. The minimum python version required is set to 3.10 due to cindex.py using PEP 604 union type syntax (str | bytes | None). The .git_archival.txt file is populated with version information needed to get accurate version information if the bindings are installed from an LLVM/clang source code archive. The .gitignore file is populated with files that may get created as part of building/testing the sdist and wheel that should not be committed to source control. This is first step for addressing llvm#125220, and moving publishing of the clang and libclang PyPI packages into the LLVM monorepo. Signed-off-by: Ryan Mast <mast.ryan@gmail.com>

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [actions/checkout](https://redirect.github.com/actions/checkout) | action | major | `v5.0.0` -> `v6.0.0` |

…lvm#169277) Add support for `arith.fptosi` and `arith.fptoui`.

…m#169330) `[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

This allows the compiler to verify we've covered all enum values.

…m#169442) Reverts llvm#164720 Revert to unblock bots. https://lab.llvm.org/buildbot/#/builders/140/builds/34645

…lvm#169284) Add support for `arith.sitofp` and `arith.uitofp`.

z1-cciauto · 2025-11-25T03:35:45Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2959

fhahn and others added 30 commits November 24, 2025 19:23

[clang-format] Fix designated initializer detection (llvm#169228)

7b186e4

Currently, in the following snippet, the second designated initializer is incorrectly detected as an OBJC method expr. Fix that and a test to make sure we don't regress. ``` Foo foo[] = {[0] = 1, [1] = 2}; ```

Revert " [clang] Refactor to remove clangDriver dependency from clang…

40334b8

…Frontend and flangFrontend (llvm#165277)" This reverts commit 3773bbe.

[CIR][X86] Add support for kshiftl/kshiftr builtins (llvm#168591)

5a9c62b

Adds support for the `__builtin_ia32_kshiftli` and `__builtin_ia32_kshiftri` X86 builtins. Part of llvm#167765 --------- Signed-off-by: vishruth-thimmaiah <vishruththimmaiah@gmail.com>

Reapply " [clang] Refactor to remove clangDriver dependency from clan…

5c15f57

…gFrontend and flangFrontend (llvm#165277)" This reverts commit 40334b8. Unfortunately the revert breaks the build.

[gn build] Port dea330b

72dd4f7

Reapply "[UBSan] [compiler-rt] add preservecc variants of handlers" (l…

ff80de7

…lvm#168973) (llvm#169091) This reverts commit 418204d.

[ADT] Fix implicit reliance on cassert in StringTable.h (llvm#169324)

51d93e7

Adds an explicit include of `<cassert>` in StringTable.h rather than relying on the one in StringRef.h. Fixes potential compile errors if assert() was undef'ed between StringRef.h and StringTable.h inclusion.

AMDGPU: Fix a comment (llvm#169403)

f581d8a

This verifier check will complain if there aren't enough implicit operands -- so it doesn't *allow* those operands, it *requires* them.

[libc++][NFC] Move __memory/aligned_alloc.h into src/ (llvm#166172)

3dcdb4c

This header is only ever used inside `src/`, so we might as well move it there. As a drive-by this also removes some dead code.

[gn build] Port 3dcdb4c

8a431db

[bazel][clang] Port dea330b (llvm#169410)

adf4c1d

[flang][NFC] Strip trailing whitespace from tests (8 of N)

ba98668

Only some fortran source files in flang/test/Lower have been modified. The other files in the directory will be cleaned up in subsequent commits

[flang][cuda] Implement this_cluster for cooperative groups (llvm#169414

ab5ae9a

) Implement `this_cluster` like `this_group` by lowering it directly like an intrinsic function. Use the NVVM operation to get the rank and size information and populate the derived type.

[Offload][NFC] Offload wrapper cleanup/refactoring (llvm#169411)

4e7ce57

Addresses feedback from llvm#147508 (review) : - Update access modifiers for SYCLWrapper members. - Update comments. - Update types.

[flang][cuda] Add support for cluster_dim_blocks in cooperative_groups (

ab2a302

llvm#169417)

googlewalt and others added 22 commits November 24, 2025 23:15

Fix path to aligned_alloc.h in #include statement (llvm#169418)

d9cf0db

This fixes llvm#166172.

Orc fix waitingongraph coalescer remove (llvm#169287)

73de1e2

[libclc] Add atomic_init, atomic_flag_clear and atomic_flag_test_and_…

8947ba0

…set (llvm#168329)

[bazel] Use zstd from the BCR (llvm#169146)

ac4cf40

This way if the downstream consuming project uses zstd we make sure they are dedup'd. This uses a new rule to make sure layering_check still works while allowing us to augment the upstream library rules with LLVM specific `defines`.

[flang][cuda] Add support for cluster_block_index in cooperative grou…

e23328b

…ps (llvm#169427)

[NFC][OpenMP] Add use_device_ptr/addr tests for when the lookup fails. (

2f8e712

llvm#169428) As per OpenMP 5.1, the pointers are expected to retain their original values when a lookup fails and there is no device pointer to translate to.

merge main into amd-staging

12e582c

[mlir][arith] Add support for extf, truncf to ArithToAPFloat (l…

7899470

…lvm#169275) Add support for `arith.extf` and `arith.truncf`. No support for custom rounding modes yet.

[LoongArch] Fix for VLDREPL node validation (llvm#168993)

1782d27

Update actions/checkout action to v6 (llvm#169258)

196f6de

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [actions/checkout](https://redirect.github.com/actions/checkout) | action | major | `v5.0.0` -> `v6.0.0` |

merge main into amd-staging

0443630

[mlir][arith] Add support for fptosi, fptoui to ArithToAPFloat (l…

3db8ed0

…lvm#169277) Add support for `arith.fptosi` and `arith.fptoui`.

[libc++][string] Applied [[nodiscard]] to non-member functions (llv…

d7f6301

…m#169330) `[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

[RISCV] Use a switch in VSETVLIInfo::print(). NFC (llvm#169441)

b63a188

This allows the compiler to verify we've covered all enum values.

Revert "[MC] Use a variant to hold MCCFIInstruction state (NFC)" (llv…

8217c64

…m#169442) Reverts llvm#164720 Revert to unblock bots. https://lab.llvm.org/buildbot/#/builders/140/builds/34645

[mlir][arith] Add support for sitofp, uitofp to ArithToAPFloat (l…

6ec6867

…lvm#169284) Add support for `arith.sitofp` and `arith.uitofp`.

merge main into amd-staging

e512b81

ronlieb requested review from a team and dpalermo November 25, 2025 03:33

ronlieb requested review from kuhar and lamb-j as code owners November 25, 2025 03:34

dpalermo approved these changes Nov 25, 2025

View reviewed changes

z1-cciauto merged commit 835ea63 into amd-staging Nov 25, 2025
42 checks passed

z1-cciauto deleted the amd/merge/upstream_merge_20251124203139 branch November 25, 2025 06:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #675

merge main into amd-staging #675

Uh oh!

ronlieb commented Nov 25, 2025

Uh oh!

z1-cciauto commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

44 participants

merge main into amd-staging #675

merge main into amd-staging #675

Uh oh!

Conversation

ronlieb commented Nov 25, 2025

Uh oh!

z1-cciauto commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

44 participants