Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Nov 14, 2025

No description provided.

lplewa and others added 30 commits November 13, 2025 15:56
Tracing requires liboffload to be initialized, so calling
isTracingEnabled() before olInit always returns false. This caused the
first trace log to look like:
```
-> OL_SUCCESS
```
instead of:
```
---> olInit() -> OL_SUCCESS
```
This patch moves the pre-call trace print for olInit so it is emitted
only after initialization.

It would be possible to add extra logic to detect whether liboffload is
already initialized and only postpone the first pre-call print, but this
would add unnecessary complexity, especially since this is tablegen
code. The difference would matter only in the unlikely case of a crash
during a second olInit call.

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
Only the fortran source files in flang/test/Intrinsics have been modified. The
other files in flang/test will be cleaned up in subsequent commits
- Adopt ifdef and namespace emitters in SubtargeEmitter.
- To aid that, factor out emission of different sections of the code
into individual helper functions.
Prepare a 'this' for CXXDefaultInitExprs
…e reduction plans (llvm#165913)

The TypeSwitch for extracting the Opcode now handles the `VPReductionRecipe` case.

Fixes llvm#165359.
This commit adds optimized assembly versions of single-precision float
multiplication and division. Both functions are implemented in a style
that can be assembled as either of Arm and Thumb2; for multiplication, a
separate implementation is provided for Thumb1. Also, extensive new
tests are added for multiplication and division.

These implementations can be removed from the build by defining the
cmake variable COMPILER_RT_ARM_OPTIMIZED_FP=OFF.

Outlying parts of the functionality which are not on the fast path, such
as NaN handling and underflow, are handled in helper functions written
in C. These can be shared between the Arm/Thumb2 and Thumb1
implementations, and also reused by other optimized assembly functions
we hope to add in future.
…ve (llvm#167875)

This prevents the backend from crashing for basic uses of __SVCount_t
type (e.g., as function arguments), without +sve2p1 or +sme2.
    
Fixes llvm#167462
Without this patch, SmallDenseMap::grow has two separate code paths to
grow the bucket array.  The code path to handle the small mode has its
own traversal over the bucket array.  This patch simplifies this logic
as follows:

1. Allocate a temporary instance of SmallDenseMap.
2. Move valid key/value pairs to the temporary instance.
3. Move LargeRep to *this.

Remarks:

- This patch adds moveFromImpl to move key/value pairs.
  moveFromOldBuckets is updated to use the new helper function.

- This patch adds a private constructor to SmallDenseMap that takes an
  exact number of buckets, accompanied by tag ExactBucketCount.

- This patch adds a fast path to deallocateBuckets in case
  getLargeRep()->NumBuckets == 0, just like destroyAll.  This path is
  used to destruct zombie instances after moves.

- In somewhat rare cases, we "grow" from the small mode to the small
  mode when there are many tombstones in the inline storage.  This is
  handled with another call to moveFrom.
…okupOrTrackRegister (llvm#167841)

The LocID for registers is just the register ID. The getLocID function
is supposed to hide this detail, but it wasn't being used consistently.

This avoids a bunch of implicit casts from Register or MCRegister to
unsigned.
… (NFC) (llvm#155262)

CMN also has a function like this, we should do the same with CMP.
This commit adds a new `ValueMatcher` class that can be used in gtest
matching contexts to match against `lldb_private::Value` objects. We
always match against the values `value_type` and `context_type`. For
HostAddress values we will also match against the expected host buffer
contents. For Scalar, FileAddress, and LoadAddress values we match
against an expected Scalar value.

The matcher is used to improve the quality of the tests in the
`DwarfExpressionTest.cpp` file. Previously, the local `Evaluate`
function would return an `Expected<Scalar>` value which makes it hard to
verify that we actually get a Value of the expected type without adding
custom evaluation code. Now we return an `Expected<Value>` so that we
can match against the full value contents.

The resulting change improves the quality of the existing checks and in
some cases eliminates the need for special code to explicitly check
value types.

I followed the gtest
[guide](https://google.github.io/googletest/gmock_cook_book.html#writing-new-monomorphic-matchers)
for writing a new value matcher.
The optimized version of xsgetn for basic_filebuf added in llvm#165223 has
an issue where if the reads come from both the buffer and the
filesystem it returns the wrong number of characters. This patch should
address the issue.
This proposal adds a `cl::opt` CLI flag
`-bpf-allow-misaligned-mem-access` to BPF target that lets users enable
allowing misaligned memory accesses.

The motivation behind the proposal is user space eBPF VMs (interpreters
or JITs running in user space) typically run on real CPUs where
unaligned memory accesses are acceptable (or handled efficiently) and
can be enabled to simplify lowering and improve performance. In
contrast, kernel eBPF must obey verifier constraints and
platform-specific alignment restrictions.

A new CLI option keeps kernel behavior unchanged while giving userspace
VMs an explicit opt-in to enable more permissive codegen. It supports
both use-cases without diverging codebases.
…7763)

As mentioned in comments for llvm#164913, the `if()` statements here
can't be externally triggered, since these writeback registers are
passed in from the caller. So they should really be `assert()`s so
it's obvious we don't need testcases for them, and more optimal.
Reverts llvm#161546

One of the buildbots reported a cmake error I don't understand, and
which I didn't get in my own test builds:
```
CMake Error at /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/compiler-rt/cmake/Modules/CheckAssemblerFlag.cmake:23 (try_compile):
  COMPILE_DEFINITIONS specified on a srcdir type TRY_COMPILE
```

My best guess is that the thing I did in `CheckAssemblerFlag.cmake` only
works on some versions of cmake. But I don't understand the problem well
enough to fix it quickly, so I'm reverting the whole patch and will
reland it later.
…m#167540)

In adopting `[[clang::nonblocking]]` there's been some user confusion.
Changes to address `-Wfunction-effects` warnings are often pure
annotation, with no runtime effect. Changes to avoid
`-Wperf-constraint-implies-noexcept` warnings are risky: adding
`noexcept` creates a new potential for the program to crash. In
retrospect, `-Wperf-constraint-implies-noexcept` shouldn't have been
made part of `-Wall`.

---------

Co-authored-by: Doug Wyatt <dwyatt@apple.com>
This changes muls by `3 << C` from `(X << C + 2) - (X << C)`
to `(X << C + 1) + (X << C)`.
If Zba is available, the output is not affected as we emit
`(shl (sh1add X, X), C)` instead.

There are two advantages:
- ADD is more compressible
- Often a reduced instruction count, by a heuristic that
  `(X << C + 1)` is more likely to have another use than `(X << C + 2)`
…7898)

So that changing the type of the container (planned in a future patch)
is less intrusive.
Upstream the basic support for the ExtVectorType element expr
…actor in the way of a glue. (llvm#167805)

In the new test, we're trying to fold a load and a X86ISD::CALL. The
call has a CopyToReg glued to it. The load and the call have different
input chains so they need to be merged. This results in a TokenFactor
that gets put between the CopyToReg and the final CALLm instruction. The
DAG scheduler can't handle that.

The load here was created by legalization of the extract_element using a
stack temporary store and load. A normal IR load would be chained into
call sequence by SelectionDAGBuilder. This would usually have the load
chained in before the CopyToReg. The store/load created by legalization
don't get chained into the rest of the DAG.

Fixes llvm#63790
AMDGPU: Start to use AV classes for unknown vector class

Use AGPR+VGPR superclasses for gfx90a+. The type used
for the class should be the broadest possible class, to
be contextually restricted later. InstrEmitter clamps these
to the common subclass of the context use instructions, so we're
best off using the broadest possible class for all types.

Note this does very little because we only use VGPR classes
for FP types (though this doesn't particularly make any sense),
and we legalize normal loads and stores to integer.
XeGPU and XeVM dialect has assigned maintainers, but related folders
currently lack code owners.
Add charithaintc and Jianhui-Li as code owner for XeGPU related folders.
Add silee2 as code owner for XeVM related folders.
Note:
charithaintc is current maintainer of XeGPU dialect.
silee2 is current maintainer of XeVM dialect.
This patch updates various LLVM headers to properly add the `LLVM_ABI`
and `LLVM_ABI_FOR_TEST` annotations to build LLVM as a DLL on Windows.

This effort is tracked in llvm#109483.
davemgreen and others added 16 commits November 13, 2025 18:05
…it (llvm#167760)

As new operations are added (for example uinc_wrap, udec_wrap,
usub_cond, usub_sat), they will not automatically be supported by
outline atomics and so should be expanded by the pre-isel pass. Make the
list of supported outline atomics explicit to make sure we only mark the
expected intrinsics as outline atomics.

Fixes llvm#167728
This patch is a follow up on llvm#167532, which refactored these method's
code into the relevant `printOperation()` functions but did not remove
them.
… C/C++ (llvm#167735)

The variable-category 'allocatable' is explicitly noted as applying only
to Fortran. If specified in C/C++ it should generate an error. NOTE:
Issue will be filed against OpenMP 6.0 specification that restriction is
missing from 'default' clause section.

From the OpenMP 6.0 specification:
  Section 7.5.1 default Clause
    Semantics, under Fortran only, L18-19, pg. 223
The allocatable variable-category specifies variables with the
ALLOCATABLE
    attribute.

  Section 7.9.9 defaultmap Clause
    Semantics, under Fortran only, L9-10, pg. 292
The allocatable variable-category specifies variables with the
ALLOCATABLE
    attribute.

    Restrictions, C/C++
      L1, pg. 293
      The specified variable-category must not be allocatable.
…ranch (llvm#166625)

GitHub's Update Branch button is a helpful tool for quickly updating a
PR before merging, but it might also be important to point out that it
creates a merge commit without additional prompting, which may or may
not be desired behavior for a given LLVM contributor.

Opened on the suggestion of @lamb-j
…lvm#167889)

The casts are currently no-op because `MCRegUnit` is a typedef'ed to
`unsigned`, but this will change soon enough and explicit cast will be
required.
…#164357)

Refactor the XCnt optimization checks so that they can be checked when
applying a pre-existing waitcnt. This removes unnecessary xcnt waits
when taking a loop backedge.
In preparation for porting to the NewPM.

Reviewers: kazutakahirata, arsenm

Reviewed By: kazutakahirata, arsenm

Pull Request: llvm#167910
)

AMDGPU: Really use AV classes by default for vector classes

Update getRegClassFor to use AV classes in place of VGPRs for
gfx90a-gfx950. There are a handful of regressions. Most are
enabling unprofitable rematerialization which reduce register
count by 1 but add an unnecessary instruction.
@ronlieb ronlieb requested review from a team and dpalermo November 14, 2025 11:16
@z1-cciauto
Copy link
Collaborator

@ronlieb ronlieb closed this Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.