Skip to content

Conversation

@z1-cciauto
Copy link
Collaborator

No description provided.

vhscampos and others added 20 commits November 5, 2025 11:14
…upport (llvm#166370)

The `FEnvSafeTest.cpp` test fails on AArch64 soft nofp configurations
because LLVM libc does not provide a floating-point environment in these
configurations.

This patch adds another preprocessor guard on `__ARM_FP` to disable the
test on those.
…llvm#166174)

Dummy variables have an entry in `Program::Globals`, but they are not
added to `GlobalIndices`. When registering redeclarations, we used to
only patch up the global indices, but that left the dummy variables
alone. Update the dummy variables of all redeclarations as well.


Fixes llvm#165952
…m#166266)

explain more about use-after-free in llvm-twine-local
add note about manually adjusting code after applying fix-it.
fixed: llvm#154810
…der (llvm#166292)

By default, the dialect conversion driver processes operations in
pre-order: the initial worklist is populated pre-order. (New/modified
operations are immediately legalized recursively.)

This commit adds a new API for selective post-order legalization.
Patterns can request an operation / region legalization via
`ConversionPatternRewriter::legalize`. They can call these helper
functions on nested regions before rewriting the operation itself.

Note: In rollback mode, a failed recursive legalization typically leads
to a conversion failure. Since recursive legalization is performed by
separate pattern applications, there is no way for the original pattern
to recover from such a failure.
…rn` (llvm#166513)

Fixes a bug in `VectorConvertToLLVMPattern`, which converted operations
with unsupported FP types. E.g., `arith.addf ... : f4E2M1FN` was lowered
to `llvm.fadd ... : i4`, which does not verify. There are a few more
patterns that have the same bug. Those will be fixed in follow-up PRs.

This commit is in preparation of adding an `APFloat`-based lowering for
`arith` operations with unsupported floating-point types.
…s.py. (llvm#164965)

This change enables update_llc_test_checks.py to automatically generate
MIR checks for RUN lines that use `-stop-before` or `-stop-after` flags
allowing tests to verify intermediate compilation stages (e.g., after
instruction selection but before peephole optimizations) alongside the
final assembly output. If `-debug-only` flag is present in the run line it's
considered as the main point of interest for testing and stop flags above
are ignored (that is no MIR checks are generated).

This resulted from the scenario, when I needed to test two instruction
matching patterns where the later pattern in the peepholer reverts the
earlier pattern in the instruction selector and distinguish it from the
case when the earlier pattern didn't worked at all.

Initially created by Claude Sonnet 4.5 it was improved later to handle
conflicts in MIR <-> ASM prefixes and formatting.
…157818)

This patch introduces the LASX and LSX conversion intrinsics:

- <8 x float> @llvm.loongarch.lasx.cast.128.s(<4 x float>)
- <4 x double> @llvm.loongarch.lasx.cast.128.d(<2 x double>)
- <4 x i64> @llvm.loongarch.lasx.cast.128(<2 x i64>)
- <8 x float> @llvm.loongarch.lasx.concat.128.s(<4 x float>, <4 x
float>)
- <4 x double> @llvm.loongarch.lasx.concat.128.d(<2 x double>, <2 x
double>)
- <4 x i64> @llvm.loongarch.lasx.concat.128(<2 x i64>, <2 x i64>)
- <4 x float> @llvm.loongarch.lasx.extract.128.lo.s(<8 x float>)
- <2 x double> @llvm.loongarch.lasx.extract.128.lo.d(<4 x double>)
- <2 x i64> @llvm.loongarch.lasx.extract.128.lo(<4 x i64>)
- <4 x float> @llvm.loongarch.lasx.extract.128.hi.s(<8 x float>)
- <2 x double> @llvm.loongarch.lasx.extract.128.hi.d(<4 x double>)
- <2 x i64> @llvm.loongarch.lasx.extract.128.hi(<4 x i64>)
- <8 x float> @llvm.loongarch.lasx.insert.128.lo.s(<8 x float>, <4 x
float>)
- <4 x double> @llvm.loongarch.lasx.insert.128.lo.d(<4 x double>, <2 x
double>)
- <4 x i64> @llvm.loongarch.lasx.insert.128.lo(<4 x i64>, <2 x i64>)
- <8 x float> @llvm.loongarch.lasx.insert.128.hi.s(<8 x float>, <4 x
float>)
- <4 x double> @llvm.loongarch.lasx.insert.128.hi.d(<4 x double>, <2 x
double>)
- <4 x i64> @llvm.loongarch.lasx.insert.128.hi(<4 x i64>, <2 x i64>)
…ster (llvm#164479)

Technically, it is possible that the a callee-saved register is saved in
different locations. CFIInstrInserter should handle this, but currently
it does not.
Use cannotHoistOrSinkRecipe to forbid sinking allocas.
When ISL encounters an internal error, it sets the error flag, but it is
not isl_error_quota that was already checked. Check for general errors
and abort the schedule optimization if that happens, instead of
continuing on the good path.

The error occured when compiling llvm-test-suite's
MultiSource/Applications/JM/lencod/leaky_bucket.c with Polly enabled.
Not adding a test case because it depends on ISL internals. We do not
want to a test case to depend on which version of ISL is used.
Currently, the ARM backend incorrectly parses every `arm` prefixed arch
to be non-thumb, but `armv6m` is THUMB and doesnt have ARM ops causing
the test to fail when compiling to assembly and not LLVM IR: `error:
Function 'foo' uses ARM instructions, but the target does not support
ARM mode execution.` This only happens when invoking cc1 directly and
not the Clang driver.

As a quick triage, this patch changes the tests to use `thumb`.

Uncovered by llvm#151404
…lvm#160536)

As discussed in llvm#153402, we have inefficiences in handling constant pool
access that are difficult to address. Using an IR pass to promote double
constants to a global allows a higher degree of control of code
generation for these accesses, resulting in improved performance on
benchmarks that might otherwise have high register pressure due to
accessing constant pool values separately rather than via a common base.

Directly promoting double constants to separate global values and
relying on the global merger to do a sensible thing would be one
potential avenue to explore, but it is _not_ done in this version of the
patch because:
* The global merger pass needs fixes. For instance it claims to be a
function pass, yet all of the work is done in initialisation. This means
that attempts by backends to schedule it after a given module pass don't
actually work as expected.
* The heuristics used can impact codegen unexpectedly, so I worry that
tweaking it to get the behaviour desired for promoted constants may lead
to other issues. This may be completely tractable though.

Now that llvm#159352 has landed, the impact on terms if dynamically executed
instructions is slightly smaller (as we are starting from a better
baseline), but still worthwhile in lbm and nab from SPEC. Results below
are for rva22u64:

```
Benchmark                  Baseline         This PR   Diff (%)
============================================================
============================================================
500.perlbench_r         180668945687    180666122417     -0.00%
502.gcc_r               221274522161    221277565086      0.00%
505.mcf_r               134656204033    134656204066      0.00%
508.namd_r              217646645332    216699783858     -0.44%
510.parest_r            291731988950    291916190776      0.06%
511.povray_r             30983594866     31107718817      0.40%
519.lbm_r                91217999812     87405361395     -4.18%
520.omnetpp_r           137699867177    137674535853     -0.02%
523.xalancbmk_r         284730719514    284734023366      0.00%
525.x264_r              379107521547    379100250568     -0.00%
526.blender_r           659391437610    659447919505      0.01%
531.deepsjeng_r         350038121654    350038121656      0.00%
538.imagick_r           238568674979    238560772162     -0.00%
541.leela_r             405660852855    405654701346     -0.00%
544.nab_r               398215801848    391352111262     -1.72%
557.xz_r                129832192047    129832192055      0.00%

```

---
Notes for reviewers:
* As discussed at the sync-up meeting, the suggestion is to try to land
an incremental improvement to the status quo even if there is more work
to be done around the general issue of constant pool handling. We can
discuss here if that is actually the best next step or not, but I just
wanted to clarify that's why this is being posted with a somewhat narrow
scope.
* I've disabled transformations both for RV32 and on systems without D
as both cases saw some regressions.
@z1-cciauto z1-cciauto requested a review from a team November 5, 2025 14:40
@z1-cciauto
Copy link
Collaborator Author

@kzhuravl kzhuravl closed this Nov 5, 2025
@kzhuravl kzhuravl deleted the upstream_merge_202511050940 branch November 5, 2025 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.