Skip to content

Conversation

@z1-cciauto
Copy link
Collaborator

No description provided.

owenca and others added 30 commits November 12, 2025 20:55
…67776)

Supporting this in GISel requires multiple changes to IRTranslator to
support aggregate returns containing scalable vectors and non-scalable
types. Falling back is the quickest way to fix the crash.

Fixes llvm#167618
This was preventing check-compiler-rt from actually running when we
touched a project that was supposed to cause compiler-rt to be tested.
Fix documentation in `abseil`, `android`, `altera`, `boost` and
`bugprone`.

This is part of the codebase cleanup described in
[llvm#167098](llvm#167098)
…tk.S"

This reverts commit 1f9eff1.

This is done in preparation of reverting parts of
885d7b7.
…nctions"

This reverts parts of commit 885d7b7,
and adds verbose comments explaining all the variants of this
function, for clarity for future readers.

It turns out that those functions actually weren't misnamed or
unused after all: Apparently Clang doesn't match GCC when it comes
to what stack probe function is referenced on i386 mingw. GCC < 4.6
references a symbol named "___chkstk", with three leading underscores,
and GCC >= 4.6 references "___chkstk_ms".

Restore these functions, to allow linking object files built with
GCC with compiler-rt.
…vm#165467)

The C locale is defined by the C standard, so we know exactly which
digits classify as (x)digits. Instead of going through the locale base
API we can simply implement functions which determine whether a
character is one ourselves, and probably improve codegen significantly
as well that way.
…converting constructor (llvm#165619)

This also backports LWG2415 as a drive-by.
)

On Windows 8 and above, the WaitOnAddress, WakeByAddressSingle and
WakeByAddressAll functions allow efficient implementation of the C++20
wait and notify features of std::atomic_flag. These Windows functions
have never been made use of in libc++, leading to very poor performance
of these features on Windows platforms, as they are implemented using a
spin loop with backoff, rather than using any OS thread signalling
whatsoever. This change implements the use of these OS functions where
available, falling back to the original implementation on Windows
versions prior to 8.

Relevant API docs from Microsoft:

https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress

https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-wakebyaddresssingle

https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-wakebyaddressall

Fixes llvm#127221
…162800)

This should improve the time it takes to run the test suite a bit. Right
now there are only a handful of headers in the modulemap because we're
missing a lot of includes in the tests. New headers should be added
there from the start, and we should fill up the modulemap over time
until it contains all the test support headers.
These headers are incredibly simple and closely related, so this merges
them into a single one.
…vm#167674)

This is an NFC for now, as the SME checks for macOS platforms are not
implemented, so zaDisable() is a no-op, but both paths for resuming from
an exception should disable ZA.

This is a fixup for a recent change in llvm#165066.
…vm#167839)

We need to get the element type size at bytecode generation time to
check. We also need to diagnose this in the LHS == RHS case.
…lvm#167712)

This fixes an assert when compiling llvm-test-suite with -march=rva23u64
-O3 that started appearing sometime this week.

We get "Cannot overlap two segments with differing ValID's" because we
try to coalescse these two vsetvlis:

    %x:gprnox0 = COPY $x8
dead $x0 = PseudoVSETIVLI 1, 208, implicit-def $vl, implicit-def $vtype
    %y:gprnox0 = COPY %x
    %v:vr = COPY $v8, implicit $vtype
    %x = PseudoVSETVLI %x, 208, implicit-def $vl, implicit-def $vtype

    -->

    %x:gprnox0 = COPY $x8
    %x = PseudoVSETVLI %x, 208, implicit-def $vl, implicit-def $vtype
    %y:gprnox0 = COPY %x
    %v:vr = COPY $v8, implicit $vtype

However to do so would cause us to extend the segment of the new value
of %x up past the first segment, which overlaps.

This fixes it by checking that its safe to extend the segment, by simply
making sure the interval isn't live at the first vsetvli.

This unfortunately causes a regression in the existing
coalesce_vl_avl_same_reg test because even though we could coalesce the
vsetvlis there, we now bail. I couldn't think of an easy way to handle
this safely, but I don't think this is an important case to handle:
After testing this patch on SPEC CPU 2017 there are no codegen changes.
…ize (llvm#165924)

yaml2obj would crash when processing Mach-O load commands with cmdsize
smaller than the actual structure size e.g. LC_SEGMENT_64 with
cmdsize=56 instead of 72. The crash occurred due to integer underflow
when calculating padding: cmdsize - BytesWritten wraps to a large value
when negative, causing a massive allocation attempt.
…165598)

MSVC supports an extension allowing to delete an array of objects via
pointer whose static type doesn't match its dynamic type. This is done
via generation of special destructors - vector deleting destructors.
MSVC's virtual tables always contain a pointer to the vector deleting
destructor for classes with virtual destructors, so not having this
extension implemented causes clang to generate code that is not
compatible with the code generated by MSVC, because clang always puts a
pointer to a scalar deleting destructor to the vtable. As a bonus the
deletion of an array of polymorphic object will work just like it does
with MSVC - no memory leaks and correct destructors are called.

This patch will cause clang to emit code that is compatible with code
produced by MSVC but not compatible with code produced with clang of
older versions, so the new behavior can be disabled via passing
-fclang-abi-compat=21 (or lower).

This is yet another attempt to land vector deleting destructors support
originally implemented by
llvm#133451.

This PR contains fixes for issues reported in the original PR as well as
fixes for issues related to operator delete[] search reported in several
issues like

llvm#133950 (comment)
llvm#134265

Fixes llvm#19772
…t` (llvm#167848)

Reland pass and fix linker errors.

---------

Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
This patch adds a TMA intrinsic for Global to
shared::cta copy, which was introduced with ptx86.
Also remove the NoCapture<> annotation from the
pointer arguments to these intrinsics, since the
copy operations are asynchronous in nature.

lit tests are verified with a ptxas from cuda-12.8.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Use it in `printVRegOrUnit()`, `getPressureSets()`/`PSetIterator`,
and in functions/classes dealing with register pressure.

Static type checking revealed several bugs, mainly in MachinePipeliner.
I'm not very familiar with this pass, so I left a bunch of FIXMEs.

There is one bug in `findUseBetween()` in RegisterPressure.cpp, also
annotated with a FIXME.
This patch fixes the latency of the SVE FADDP instruction for the
Neoverse-N3 SWOG. The latency of flaoting point arith, min/max pairwise
SVE FADDP should be 3, as per the N3 SWOG.
… a usubo.0)" (llvm#167854)

Reverts llvm#161651 due to downstream bad codegen reports
…167856)

We only seem to use the SVE fdot for fixed-length vector types when they
are larger than 128bits, whereas we can also use them for 128bits
vectors if SVE2p1/SME2 is available.
As in title. See here for more context:
https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract/80909

Also add a warning in llc when global contract flag is encountered on x86. 
Remove global contract from last x86 test
…ementwise_fma and support constexpr (llvm#154731)

Now that llvm#152455 is done, we can make all the scalar fma intrinsics to
wrap __builtin_elementwise_fma, which also allows constexpr

The main difference is that FMA4 intrinsics guarantee that the upper
elements are zero, while FMA3 passes through the destination register
elements like older scalar instructions

Fixes llvm#154555
…ars are mergeable (llvm#167667)

See if each pair of scalar operands of a build vector can be freely
merged together - typically if they've been split for some reason by
legalization.

If we can create a new build vector node with double the scalar size,
but half the element count - reducing codegen complexity and potentially
allowing further optimization.

I did look at performing this generically in DAGCombine, but we don't
have as much control over when a legal build vector can be folded -
another generic fold would be to handle this on insert_vector_elt pairs,
but again legality checks could be limiting.

Fixes llvm#167498
lukel97 and others added 3 commits November 13, 2025 19:53
…lvm#167826)

This patch replaces generic `LLVM_Type` with specific `I32` type in NVVM
operations.

`NVVM_SyncWarpOp`: Change mask parameter from `LLVM_Type` to `I32`.
`NVVM_CpAsyncOp`: Change cpSize parameter from `Optional<LLVM_Type>` to
`Optional<I32>`.

Signed-off-by: Dharuni R Acharya <dharunira@nvidia.com>
@z1-cciauto z1-cciauto requested a review from a team November 13, 2025 12:07
@z1-cciauto
Copy link
Collaborator Author

@z1-cciauto z1-cciauto merged commit da9a9ea into amd-staging Nov 13, 2025
14 checks passed
@z1-cciauto z1-cciauto deleted the upstream_merge_202511130707 branch November 13, 2025 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.