forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 76
merge main into amd-staging #607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Identified with llvm-use-ranges.
This patch adds computeNumBuckets, a helper function to compute the number of buckets. This is part of the effort outlined in llvm#168255. This makes it easier to move the core logic of grow() to DenseMapBase::grow().
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [aminya/setup-cpp](https://redirect.github.com/aminya/setup-cpp) | action | patch | `v1.7.1` -> `v1.7.2` | | ghcr.io/llvm/ci-ubuntu-24.04-abi-tests | container | digest | `01e66b0` -> `f80125c` | | [github/codeql-action](https://redirect.github.com/github/codeql-action) | action | patch | `v4.31.2` -> `v4.31.3` | | llvm/actions | action | digest | `42d8057` -> `5dd9550` |
…ls (llvm#165692) This PR introduces `amdgpu-lower-exec-sync` pass which specifically lowers named-barrier LDS globals introduced by llvm#114550 . Changes include: - Moving the logic of lowering named-barrier LDS globals from `amdgpu-lower-module-lds` pass to this new pass. - This PR adds the pass to pipeline, remove the existing lowering logic for named-barrier LDS in `amdgpu-lower-module-lds` See llvm#161827 for discussion on this topic.
This patch teaches DenseMap constructors to delegate to other DenseMap constructors where we can. The intent is for these constructors to build on top of a higher-level concept like the default-constructed instance instead of calling init on our own. This is part of the effort outlined in llvm#168255.
…min (llvm#167541) Following llvm#112393, this aims to promote vp intrinsics for zvfbfmin without zvfbfa
…68319) The change made in llvm#162433 exposed a weakness in this test that showed different results on different archs that were not caught on the CI bots. This expands the tests to cover more archs, and out of necessity moves the os_log test into a separate test file.
-- This commit is the second in the series of adding matchers for linalg.*conv*/*pool*. Refer: llvm#163724 -- In this commit all variants of Conv1D convolution ops have been added. -- For sake of completion for a specific infra required for those ops which don't require dilations/strides information during their creation, this commit also includes a basic Conv2D and Conv3D op as part of the lit test. Signed-off-by: Abhishek Varma <abhvarma@amd.com>
The pointer needs to point to a record. Fixes llvm#166371
…-specific constant offset. (llvm#165591) In the Dhrystone benchmark, I find some adjacent global not be merged, on the contrary the GCC's anchor optimize is work. Use global-merge-max-offset to set the max offset can yield similar results (still slightly different, at least we can control the offset).
…lvm#168323) The "MachO_ptrauth_noolloc_sections.yaml" testcase had a typo in the name, and didn't explicitly specify its endianness. This was causing it to fail when enabled on big endian platforms. See conversation in llvm#167902
This patch adds support for shared::cta as destination space in the TMA non-tensor copy Op (from global to shared::cta). * Appropriate verifier checks are added. * Unit tests are added to verify the lowering. The related intrinsic changes were merged through PR llvm#167508. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
…7866) The `DebugObjectManagerPlugin` implements debugger support for ELF platforms with the GDB JIT Interface. It emits a separate debug object allocation in addition to the LinkGraph's own allocation. This used to happen in the plugin's `notifyEmitted()` callback, i.e. after the LinkGraph's allocation was finalized. In the meantime, it had to block finalization of the corresponding materialization unit to make sure that the debugger can register the object before the code runs. This patch switches the plugin to use an allocation action instead. We can remove the `notifyEmitted()` hook and implement all steps as JITLink passes.
- Introduce the -aarch64-force-unroll-threshold option; when a loop’s cost is below this value we set UP.Force = true (default 0 keeps current behaviour) - Add an AArch64 loop-unroll regression test that runs once at the default threshold and once with the flag raised, confirming forced unrolling
…rted with +sve-b16b16 (llvm#167717) The resulting costs are the same as the standard SVE costs for `half` types.
…lvm#168052) Merge the following classes into `SIGfx6CacheControl`: - SIGfx7CacheControl - SIGfx90ACacheControl - SIGfx940CacheControl They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same. This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions. There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review.
…167095) - This patch detects cycles by phis and bails out if one is found. - It prevents to violate DAG restrictions. Abort pipelining in the below case %1 = phi i32 [ %a, %entry ], [ %3, %loop ] %2 = phi i32 [ %a, %entry ], [ %1, %loop ] %3 = phi i32 [ %b, %entry ], [ %2, %loop ] --------- Co-authored-by: Ryotaro Kasuga <kasuga.ryotaro@fujitsu.com>
Think it's just a path slash difference. Fixes llvm#167997.
When both `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` and MLIR multithreading are enabled, `topLevelFingerPrint` is empty but its value is accessed. This adds a `has_value()` check before dereferencing the optional.
…165840) Currently LLDB's `ParseRustVariantPart` generates the following `CXXRecordDecl` for a Rust enum ```rust enum AA { A(u8) } ``` ``` CXXRecordDecl 0x5555568d5970 <<invalid sloc>> <invalid sloc> struct AA |-CXXRecordDecl 0x5555568d5ab0 <<invalid sloc>> <invalid sloc> union test_issue::AA$Inner definition | |-CXXRecordDecl 0x5555568d5d18 <<invalid sloc>> <invalid sloc> struct A$Variant definition | | |-DefinitionData pass_in_registers aggregate standard_layout trivially_copyable trivial | | | `-Destructor simple irrelevant trivial needs_implicit | | `-FieldDecl 0x555555a77880 <<invalid sloc>> <invalid sloc> value 'test_issue::AA::A' | `-FieldDecl 0x555555a778f0 <<invalid sloc>> <invalid sloc> $variant$ 'test_issue::AA::test_issue::AA$Inner::A$Variant' |-CXXRecordDecl 0x5555568d5c48 <<invalid sloc>> <invalid sloc> struct A definition | `-FieldDecl 0x555555a777e0 <<invalid sloc>> <invalid sloc> __0 'unsigned char' `-FieldDecl 0x555555a77960 <<invalid sloc>> <invalid sloc> $variants$ 'test_issue::AA::test_issue::AA$Inner' ``` While when the Rust enum type name is the same as its variant name, the generated `CXXRecordDecl` becomes the following – there's a circular reference between `struct A$Variant` and `struct A`, causing llvm#163048. ```rust enum A { A(u8) } ``` ``` CXXRecordDecl 0x5555568d5760 <<invalid sloc>> <invalid sloc> struct A |-CXXRecordDecl 0x5555568d58a0 <<invalid sloc>> <invalid sloc> union test_issue::A$Inner definition | |-CXXRecordDecl 0x5555568d5a38 <<invalid sloc>> <invalid sloc> struct A$Variant definition | | `-FieldDecl 0x5555568d5b70 <<invalid sloc>> <invalid sloc> value 'test_issue::A' <---- bug here | `-FieldDecl 0x5555568d5be0 <<invalid sloc>> <invalid sloc> $variant$ 'test_issue::A::test_issue::A$Inner::A$Variant' `-FieldDecl 0x5555568d5c50 <<invalid sloc>> <invalid sloc> $variants$ 'test_issue::A::test_issue::A$Inner' ``` The problem was caused by `GetUniqueTypeNameAndDeclaration` not returning the correct qualified name for DWARF DIE `test_issue::A::A`, instead, it returned `A`. This caused `ParseStructureLikeDIE` to find the wrong type `test_issue::A` and returned early. The failure in `GetUniqueTypeNameAndDeclaration` appears to stem from a language check that returns early unless the language is C++. I changed it so Rust follows the C++ path rather than returning. I’m not entirely sure this is the right approach — Rust’s qualified name rules look similar, but not identical? Alternatively, we could add a Rust-specific implementation that forms qualified names according to Rust's rules.
…lvm#167223) Fixes llvm#165713 This patch handles out-of-bound vector elements and truncates extra bits.
…lvm#167944) We generate a ADDLV node that incorporates a vecreduce(zext) from elements of half the size. This means that we need the input type to be at least twice the size of the input. I updated some variable names whilst I was here. Fixes llvm#167935
This prevents a machine verifier error, where it "Expected implicit register after groups". Fixes llvm#158661
…#163562) Symbols used for dynamic extent information of memory regions are now kept as live as long as the memory region exists.
For a scalar only VPlan with tail folding, if it has a phi live out then
legalizeAndOptimizeInductions will scalarize the widened canonical IV
feeding into the header mask:
<x1> vector loop: {
vector.body:
EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0>
EMIT vp<%6> = icmp ule vp<%5>, vp<%3>
EMIT vp<%index.next> = add nuw vp<%4>, vp<%1>
EMIT branch-on-count vp<%index.next>, vp<%2>
No successors
}
Successor(s): middle.block
middle.block:
EMIT vp<%8> = last-active-lane vp<%6>
EMIT vp<%9> = extract-lane vp<%8>, vp<%5>
Successor(s): ir-bb<exit>
The verifier complains about this but this should still generate the
correct last active lane, so this fixes the assert by handling this case
in isHeaderMask. There is a similar pattern already there for
ActiveLaneMask, which also expects a VPScalarIVSteps recipe.
Fixes llvm#167813
…m#165426) This is in preparation for a patch that will only fold offsets into flat instructions if their addition is inbounds. Marking the GEPs inbounds here means that their output won't change with the later patch. Basically a retry of the very similar PR llvm#131994, as part of an updated stack of PRs. For SWDEV-516125.
We already ensure that code for different architectures is always placed in different pages in `assignAddresses`. We represent those ranges using their first and last chunks. However, the RVAs of those chunks may not be page-aligned, for example, due to extra padding for entry-thunk offsets. Align the chunk RVAs to the page boundary so that the emitted ranges correctly include the entire region. This change affects an existing test that checks corner cases triggered by merging a data section into a code section. We may now include such data in the code range. This differs from MSVC’s behavior, but it should not cause practical issues, and the new behavior is arguably more correct. Fixes llvm#168119.
…dAssemblyInstEmulation (llvm#168340)
…template arguments (llvm#167341) Although very unusual, the SVal of the argument is not checked for UnknownVal, so we may get a null pointer dereference. In addition, the template arguments of the variant are retrieved incorrectly when type aliases are involved, causing crashes and FPs/FNs.
…lvm#166244) Add a new pinrRecipe which handles printing the recipe without common info like debug info or metadata. Prepares to print them once, in ::print(), after/in combination with llvm#165825. PR: llvm#166244
…rogram (llvm#167758) This fixes the issue reported in llvm#166855 (comment) that had been revealed after llvm#166855 was merged. `CodeGenFunction::GenerateVarArgsThunk` creates thunks for vararg functions by cloning and modifying them. It is different from `CodeGenFunction::generateThunk`, which is used for Itanium ABI. According to https://reviews.llvm.org/D39396, `CodeGenFunction::GenerateVarArgsThunk` may be called before metadata nodes are resolved. So, it tries to avoid remapping DISubprogram and all metadata nodes it references inside `CloneFunction()` by manually cloning DISubprogram. If optimization level is not OptNone, DILocalVariables for a function are saved in DISubprogram's retainedNodes field. When `CodeGenFunction::GenerateVarArgsThunk` clones such DISubprogram without remapping, it produces a subprogram with incorrectly-scoped retained nodes. It triggers Verifier checks added in llvm#166855. To solve that, retained nodes list of a cloned DISubprogram is cleared.
Collaborator
Author
ronlieb
approved these changes
Nov 17, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.