Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Nov 25, 2025

No description provided.

fhahn and others added 30 commits November 24, 2025 19:23
If the expected trip count is less than the VF, the vector loop will
only execute a single iteration. When that's the case, the cost of the
middle block has the same impact as the cost of the vector loop. Include
it in isOutsideLoopWorkProfitable to avoid vectorizing when the extra
work in the middle block makes it unprofitable.

Note that isOutsideLoopWorkProfitable already scales the cost of blocks
outside the vector region, but the patch restricts accounting for the
middle block to cases where VF <= ExpectedTC, to initially catch some
worst cases and avoid regressions.

This initial version should specifically avoid unprofitable tail-folding
for loops with low trip counts after re-applying
llvm#149042.

PR: llvm#168949
Currently, in the following snippet, the second designated initializer
is incorrectly detected as an OBJC method expr. Fix that and a test to
make sure we don't regress.

```
Foo foo[] = {[0] = 1, [1] = 2};
```
Adds support for the `__builtin_ia32_kshiftli` and
`__builtin_ia32_kshiftri` X86 builtins.

Part of llvm#167765

---------

Signed-off-by: vishruth-thimmaiah <vishruththimmaiah@gmail.com>
…gFrontend and flangFrontend (llvm#165277)"

This reverts commit 40334b8.

Unfortunately the revert breaks the build.
…Frontend and flangFrontend (llvm#165277)" (llvm#169397)

This reverts commit 3773bbe and relands the last revert attempt 40334b8.
3773bbe broke the build for the build configuration described in here:
llvm#165277 (comment)
)

This PR introduces a new additional type of map lowering for record
types that Clang currently supports, in which a user can map a top-level
record type and then individual members with different mapping,
effectively creating a sort of "overlapping" mapping that we attempt to
cut around.

This is currently most predominantly used in Fortran, when mapping
descriptors and there data, we map the descriptor and its data with
separate map modifiers and "cut around" the pointer data, so that wedo
not overwrite it unless the runtime deems it a neccesary action based on
its reference counting mechanism. However, it is a mechanism that will
come in handy/trigger when a user explitily maps a record type (derived
type or structure) and then explicitly maps a member with a different
map type.

These additions were predominantly in the OpenMPToLLVMIRTranslation.cpp
file and phase, however, one Flang test that checks end-to-end IR
compilation (as far as we care for now at least) was altered.

2/3 required PRs to enable declare target to mapping, should look at PR
3/3 to check for full green passes (this one will fail a number due to
some dependencies).

Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
…ntation (llvm#119589)

While the infrastructure for declare target to/enter and link for
variables exists in the MLIR dialect and at the Flang level, the current
lowering from MLIR -> LLVM IR isn't in place, it's only in place for
variables that have the link clause applied.

This PR aims to extend that lowering to an initial implementation that
incorporates declare target to as well, which primarily requires changes
in the OpenMPToLLVMIRTranslation phase. However, a minor addition to the
OpenMP dialect was required to extend the declare target enumerator to
include a default None field as well.

This also requires a minor change to the Flang lowering's
MapInfoFinlization.cpp pass to alter the map type for descriptors to
deal with cases where a variable is marked declare to. Currently, when a
descriptor variable is mapped declare target to the descriptor component
can become attatched, and cannot be updated, this results in issues when
an unusual allocation range is specified (effectively an off-by X
error). The current solution is to map the descriptor always, as we
always require an up-to-date version of this data. However, this also
requires an interlinked PR that adds a more intricate type of mapping of
structures/record types that clang currently implements, to circumvent
the overwriting of the pointer in the descriptor.

3/3 required PRs to enable declare target to mapping, this PR should
pass all tests and provide an all green CI.

Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
Adds an explicit include of `<cassert>` in StringTable.h rather than
relying on the one in StringRef.h. Fixes potential compile errors if
assert() was undef'ed between StringRef.h and StringTable.h inclusion.
This verifier check will complain if there aren't enough implicit
operands -- so it doesn't *allow* those operands, it *requires* them.
…8417)

This patch add support for lowering of custom reductions to MLIR. It
also enhances the capability of the pass to automatically mark functions
as "declare target" by traversing custom reduction initializers and
combiners.
…lvm#169235)

This fixes test errors like this, at least for a mingw target, if
building with Clang 21 instead of Clang 20, as in the CI environment:

    # .---command stderr------------
    # | error: 'expected-error' diagnostics seen but not expected:
    # | File C:\a\llvm-mingw\llvm-mingw\llvm-project\libcxx\test\std\input.output\file.streams\c.files\gets-removed.verify.cpp Line 16: cannot initialize a parameter of type 'char *' with an lvalue of type 'const char *'
    # | 1 error generated.
    # `-----------------------------
    # error: command failed with exit status: 1

This extra, unexpected diagnostic appears in Clang 21, since commit
9eef4d1 ("Remove delayed typo
expressions"). Before this, we got the expected diagnostic `error: no
member named 'gets' in namespace 'std'`, with the typo correction hint
`did you mean 'puts'?`. After this change, we get the typo correction
hint `did you mean simply 'gets'?` instead. And with the typo correction
finding `::gets`, it goes on to produce a second diagnostic about
mismatched parameter for that function.

Avoid these unexpected diagnostics by passing the right type of
parameter to the gets function.
…8468)

This commit adds a `MockDwarfDelegate` class that can be used to control
what dwarf version is used when evaluating an expression. We also add a
simple test that shows how dwarf version can change the result of the
expression.
This header is only ever used inside `src/`, so we might as well move it
there. As a drive-by this also removes some dead code.
…ualifiers of type locs. (llvm#167619)

Previously, e.g. for TypeLoc "MyNamespace::MyClass", `node()` selects
only "MyClass" without the qualifier. With this change, it now selects
"MyNamespace::MyClass".

---------

Co-authored-by: Florian Mayer <fmayer@google.com>
…lvm#168857)

This avoids dozens of instances of benign error messages being printed
when running the tests on e.g. Windows:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: module 'os' has no attribute 'sysconf'

Co-authored-by: Florian Mayer <fmayer@google.com>
Only some fortran source files in flang/test/Lower have been modified.
The other files in the directory will be cleaned up in subsequent commits
…vm#169327)

OffsetSizeAndStrideOpInterface does not specify whether it's operating
on the input or output shape and in fact different ops implement this in
different ways, which is also why SubviewOp is special cased here.

This "marked as dynamic but not really dynamic" folding is better
handled by shape inference, so just remove the bad fold.
SVE depends on a combination of host support and operating system
support. Sometimes those don't line up with detected host CPU name; make
sure SVE is disabled when it isn't available. Implement this for both
Windows and Linux. (We don't have a codepath for other operating
systems. If someone wants to implement this, it should be possible to
adapt fmv code from compiler-rt.)

While I'm here, also add support for detecting other Windows CPU
features.

For Windows, declare constants ourselves so the code builds on older
SDKs; we also do this in compiler-rt.
This is currently a no op.
This will be supported for the minimal runtime in a follow up. This
allows
to improve codegen for fsanitize-recover by compiling the handlers with
[[clang::preserve_all]]. This makes sure that the caller does not need
to spill any registers. We do not expect this function to be called
frequently, so this is beneficial for code size.
)

Implement `this_cluster` like `this_group` by lowering it directly like
an intrinsic function. Use the NVVM operation to get the rank and size
information and populate the derived type.
Addresses feedback from

llvm#147508 (review)
:
- Update access modifiers for SYCLWrapper members.
- Update comments.
- Update types.
…#169346)

PR168884 flagged compiler directives (!dir$ ...) inside OpenMP loop
constructs as errors. This caused some customer applications to fail to
compile (issue 169229).

Downgrade the error to a warning, and gracefully ignore compiler
directives when lowering loop constructs to MLIR.

Fixes llvm#169229
In the LoweringPrepare pass, the handling for global array destructor
lowering was mishandling the insertion point, so that if this code
needed to create a declaration for the __cxa_atexit function, that
declaration was being created in the dtor region, rather than at module
scope. This change fixes that.
AMDGPU requires more complex CFI rules, normally these would be
expressed with .cfi_escape, however this would make the CFI unreadable
and makes it difficult to update registers in CFI instructions (also
something AMDGPU requires).

Authored-by: Emma Pilkington <Emma.Pilkington@amd.com>
googlewalt and others added 22 commits November 24, 2025 23:15
This is mostly the output of a vibe coded script running on
VecFuncs.def, with a lot of manual cleanups and fixing where the
vibes were off. This is not yet wired up to anything (except for the
handful of calls which are already manually enabled). In the future
the SystemLibrary mechanism needs to be generalized to allow plugging
these sets in based on the flag.

One annoying piece is there are some name conflicts across the
libraries. Some of the libmvec functions have name collisions with some 
sleef functions. I solved this by just adding a prefix to the libmvec functions. 
It would probably be a good idea to add a prefix to every group. It gets ugly,
particularly since some of the sleef functions started to use a Sleef_
prefix, but mostly do not.
… supported (llvm#169252)

Fix kernel build when cl_khr_fp64 is not enabled:
opencl-c.h:13785:50: error: unknown type name 'atomic_double'
13785 | double __ovld atomic_fetch_min(volatile __global atomic_double
*, double);
opencl-c.h:13785:67: error: use of type 'double' requires cl_khr_fp64
and __opencl_c_fp64 support
13785 | double __ovld atomic_fetch_min(volatile __global atomic_double
*, double);

This is a regression introduced by 423bdb2. Before that commit,
__opencl_c_ext_fp64_global_atomic_add was guarded by cl_khr_fp64 in
opencl-c-base.h.
Previously, i16 `bswap` was lowered using multiple shift and OR
operations. This patch adds a pattern to directly lower i16 `bswap`
using the `PRMT` (permute) instruction, which is more efficient.

Additionally, the lowering of `bswap` is moved into operation
legalization, which allows for DAGCombiner to optimize the lowered code.
This way if the downstream consuming project uses zstd we make sure
they are dedup'd. This uses a new rule to make sure layering_check still
works while allowing us to augment the upstream library rules with LLVM
specific `defines`.
Clarify how Clang-generated HIP fat binaries are registered and
unregistered with the HIP runtime, and how this interacts with global
constructors, destructors, and atexit handlers. Document that there is
no strong guarantee on ordering relative to user-defined global
ctors/dtors, recommend that HIP application developers avoid using
kernels or device variables from global ctors/dtors, and describe the
implications for HIP runtime developers (synchronization and guards in
__hipRegisterFatBinary/__hipUnregisterFatBinary). This is motivated by
questions from HIP application and runtime developers about fat binary
registration/unregistration order and its potential interference with
their own initialization and teardown code.
llvm#169428)

As per OpenMP 5.1, the pointers are expected to retain their original
values when a lookup fails and there is no device pointer to translate
to.
…lvm#169275)

Add support for `arith.extf` and `arith.truncf`. No support for custom
rounding modes yet.
This adds a pyproject.toml file for packaging the clang Python bindings
as a sdist tarball and pure Python wheel packages for the clang python
bindings. It is required to move updates of the clang and libclang PyPI
packages to the LLVM monorepo. Versioning information is derived from
LLVM git tags (using hatch-vcs, which is based on setuptools_scm), so no
manual updates are needed to bump version numbers. The minimum python
version required is set to 3.10 due to cindex.py using PEP 604 union
type syntax (str | bytes | None).

The .git_archival.txt file is populated with version information needed
to get accurate version information if the bindings are installed from
an LLVM/clang source code archive. The .gitignore file is populated with
files that may get created as part of building/testing the sdist and
wheel that should not be committed to source control.

This is first step for addressing llvm#125220, and moving publishing of the
clang and libclang PyPI packages into the LLVM monorepo.

Signed-off-by: Ryan Mast <mast.ryan@gmail.com>
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [actions/checkout](https://redirect.github.com/actions/checkout) |
action | major | `v5.0.0` -> `v6.0.0` |
…lvm#169277)

Add support for `arith.fptosi` and `arith.fptoui`.
…m#169330)

`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.
- https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
This allows the compiler to verify we've covered all enum values.
…lvm#169284)

Add support for `arith.sitofp` and `arith.uitofp`.
@ronlieb ronlieb requested review from a team and dpalermo November 25, 2025 03:33
@ronlieb ronlieb requested review from kuhar and lamb-j as code owners November 25, 2025 03:34
@z1-cciauto
Copy link
Collaborator

@z1-cciauto z1-cciauto merged commit 835ea63 into amd-staging Nov 25, 2025
42 checks passed
@z1-cciauto z1-cciauto deleted the amd/merge/upstream_merge_20251124203139 branch November 25, 2025 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.