forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 76
merge main into amd-staging #581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…67776) Supporting this in GISel requires multiple changes to IRTranslator to support aggregate returns containing scalable vectors and non-scalable types. Falling back is the quickest way to fix the crash. Fixes llvm#167618
…MergeInputChains. NFC (llvm#167807)
…ing pass: `arith-to-apfloat`" (llvm#167834) Reverts llvm#167608 Broken builder https://lab.llvm.org/buildbot/#/builders/52/builds/12781
This was preventing check-compiler-rt from actually running when we touched a project that was supposed to cause compiler-rt to be tested.
Fix documentation in `abseil`, `android`, `altera`, `boost` and `bugprone`. This is part of the codebase cleanup described in [llvm#167098](llvm#167098)
…nctions" This reverts parts of commit 885d7b7, and adds verbose comments explaining all the variants of this function, for clarity for future readers. It turns out that those functions actually weren't misnamed or unused after all: Apparently Clang doesn't match GCC when it comes to what stack probe function is referenced on i386 mingw. GCC < 4.6 references a symbol named "___chkstk", with three leading underscores, and GCC >= 4.6 references "___chkstk_ms". Restore these functions, to allow linking object files built with GCC with compiler-rt.
…vm#165467) The C locale is defined by the C standard, so we know exactly which digits classify as (x)digits. Instead of going through the locale base API we can simply implement functions which determine whether a character is one ourselves, and probably improve codegen significantly as well that way.
…converting constructor (llvm#165619) This also backports LWG2415 as a drive-by.
) On Windows 8 and above, the WaitOnAddress, WakeByAddressSingle and WakeByAddressAll functions allow efficient implementation of the C++20 wait and notify features of std::atomic_flag. These Windows functions have never been made use of in libc++, leading to very poor performance of these features on Windows platforms, as they are implemented using a spin loop with backoff, rather than using any OS thread signalling whatsoever. This change implements the use of these OS functions where available, falling back to the original implementation on Windows versions prior to 8. Relevant API docs from Microsoft: https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-wakebyaddresssingle https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-wakebyaddressall Fixes llvm#127221
…162800) This should improve the time it takes to run the test suite a bit. Right now there are only a handful of headers in the modulemap because we're missing a lot of includes in the tests. New headers should be added there from the start, and we should fill up the modulemap over time until it contains all the test support headers.
These headers are incredibly simple and closely related, so this merges them into a single one.
…vm#167674) This is an NFC for now, as the SME checks for macOS platforms are not implemented, so zaDisable() is a no-op, but both paths for resuming from an exception should disable ZA. This is a fixup for a recent change in llvm#165066.
…vm#167839) We need to get the element type size at bytecode generation time to check. We also need to diagnose this in the LHS == RHS case.
…lvm#167712) This fixes an assert when compiling llvm-test-suite with -march=rva23u64 -O3 that started appearing sometime this week. We get "Cannot overlap two segments with differing ValID's" because we try to coalescse these two vsetvlis: %x:gprnox0 = COPY $x8 dead $x0 = PseudoVSETIVLI 1, 208, implicit-def $vl, implicit-def $vtype %y:gprnox0 = COPY %x %v:vr = COPY $v8, implicit $vtype %x = PseudoVSETVLI %x, 208, implicit-def $vl, implicit-def $vtype --> %x:gprnox0 = COPY $x8 %x = PseudoVSETVLI %x, 208, implicit-def $vl, implicit-def $vtype %y:gprnox0 = COPY %x %v:vr = COPY $v8, implicit $vtype However to do so would cause us to extend the segment of the new value of %x up past the first segment, which overlaps. This fixes it by checking that its safe to extend the segment, by simply making sure the interval isn't live at the first vsetvli. This unfortunately causes a regression in the existing coalesce_vl_avl_same_reg test because even though we could coalesce the vsetvlis there, we now bail. I couldn't think of an easy way to handle this safely, but I don't think this is an important case to handle: After testing this patch on SPEC CPU 2017 there are no codegen changes.
…ize (llvm#165924) yaml2obj would crash when processing Mach-O load commands with cmdsize smaller than the actual structure size e.g. LC_SEGMENT_64 with cmdsize=56 instead of 72. The crash occurred due to integer underflow when calculating padding: cmdsize - BytesWritten wraps to a large value when negative, causing a massive allocation attempt.
…165598) MSVC supports an extension allowing to delete an array of objects via pointer whose static type doesn't match its dynamic type. This is done via generation of special destructors - vector deleting destructors. MSVC's virtual tables always contain a pointer to the vector deleting destructor for classes with virtual destructors, so not having this extension implemented causes clang to generate code that is not compatible with the code generated by MSVC, because clang always puts a pointer to a scalar deleting destructor to the vtable. As a bonus the deletion of an array of polymorphic object will work just like it does with MSVC - no memory leaks and correct destructors are called. This patch will cause clang to emit code that is compatible with code produced by MSVC but not compatible with code produced with clang of older versions, so the new behavior can be disabled via passing -fclang-abi-compat=21 (or lower). This is yet another attempt to land vector deleting destructors support originally implemented by llvm#133451. This PR contains fixes for issues reported in the original PR as well as fixes for issues related to operator delete[] search reported in several issues like llvm#133950 (comment) llvm#134265 Fixes llvm#19772
…t` (llvm#167848) Reland pass and fix linker errors. --------- Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
Fix build after llvm#167848.
This patch adds a TMA intrinsic for Global to shared::cta copy, which was introduced with ptx86. Also remove the NoCapture<> annotation from the pointer arguments to these intrinsics, since the copy operations are asynchronous in nature. lit tests are verified with a ptxas from cuda-12.8. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Use it in `printVRegOrUnit()`, `getPressureSets()`/`PSetIterator`, and in functions/classes dealing with register pressure. Static type checking revealed several bugs, mainly in MachinePipeliner. I'm not very familiar with this pass, so I left a bunch of FIXMEs. There is one bug in `findUseBetween()` in RegisterPressure.cpp, also annotated with a FIXME.
This patch fixes the latency of the SVE FADDP instruction for the Neoverse-N3 SWOG. The latency of flaoting point arith, min/max pairwise SVE FADDP should be 3, as per the N3 SWOG.
… a usubo.0)" (llvm#167854) Reverts llvm#161651 due to downstream bad codegen reports
…167856) We only seem to use the SVE fdot for fixed-length vector types when they are larger than 128bits, whereas we can also use them for 128bits vectors if SVE2p1/SME2 is available.
As in title. See here for more context: https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract/80909 Also add a warning in llc when global contract flag is encountered on x86. Remove global contract from last x86 test
…ementwise_fma and support constexpr (llvm#154731) Now that llvm#152455 is done, we can make all the scalar fma intrinsics to wrap __builtin_elementwise_fma, which also allows constexpr The main difference is that FMA4 intrinsics guarantee that the upper elements are zero, while FMA3 passes through the destination register elements like older scalar instructions Fixes llvm#154555
…ars are mergeable (llvm#167667) See if each pair of scalar operands of a build vector can be freely merged together - typically if they've been split for some reason by legalization. If we can create a new build vector node with double the scalar size, but half the element count - reducing codegen complexity and potentially allowing further optimization. I did look at performing this generically in DAGCombine, but we don't have as much control over when a legal build vector can be folded - another generic fold would be to handle this on insert_vector_elt pairs, but again legality checks could be limiting. Fixes llvm#167498
RISC-V test coverage for llvm#167857
…lvm#167826) This patch replaces generic `LLVM_Type` with specific `I32` type in NVVM operations. `NVVM_SyncWarpOp`: Change mask parameter from `LLVM_Type` to `I32`. `NVVM_CpAsyncOp`: Change cpSize parameter from `Optional<LLVM_Type>` to `Optional<I32>`. Signed-off-by: Dharuni R Acharya <dharunira@nvidia.com>
Collaborator
Author
ronlieb
approved these changes
Nov 13, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.