-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8351623: VectorAPI: Add SVE implementation of subword gather load operation #26236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back xgong! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@XiaohongGong The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Hi @Bhavana-Kilambi, @fg1417, could you please help take a look at this PR? BTW, since the vector register size of my SVE machine is 128-bit, could you please help test the correctness on a SVE machine with larger vector size (e.g. 512-bit vector size)? Thanks a lot in advance! |
Hi @XiaohongGong , thank you for doing this. As for testing, we can currently only test on 256-bit SVE machines (we no longer have any 512bit machines). We will get back to you with the results soon. |
Testing on 256-bit SVE machines are fine to me. Thanks so much for your help! |
ins_pipe(pipe_slow); | ||
%} | ||
|
||
instruct vmaskwiden_hi_sve(pReg dst, pReg src) %{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can both the hi and lo widen rules be combined into a single one as the arguments are the same? or would it make it less understandable?
@@ -348,6 +347,12 @@ source %{ | |||
return false; | |||
} | |||
|
|||
// SVE requires vector indices for gather-load/scatter-store operations | |||
// on all data types. | |||
bool Matcher::gather_scatter_needs_vector_index(BasicType bt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a function that tests for UseSVE > 0
here -
static bool supports_scalable_vector() { |
Can it be reused?
match(Set dst (VectorSlice (Binary src1 src2) index)); | ||
format %{ "vslice_neon $dst, $src1, $src2, $index" %} | ||
ins_encode %{ | ||
uint length_in_bytes = Matcher::vector_length_in_bytes(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation. two spaces..
match(Set dst_src1 (VectorSlice (Binary dst_src1 src2) index)); | ||
format %{ "vslice_sve $dst_src1, $dst_src1, $src2, $index" %} | ||
ins_encode %{ | ||
assert(UseSVE > 0, "must be sve"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation. two spaces..
// ---------------------------- Vector Slice ------------------------ | ||
|
||
instruct vslice_neon(vReg dst, vReg src1, vReg src2, immI index) %{ | ||
predicate(VM_Version::use_neon_for_vector(Matcher::vector_length_in_bytes(n))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation. I think there're 3 spaces here.. Same with the SVE version below.
This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform.
Background
Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for
byte
/short
types usingint
vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e.int
elements). Hence, the total size is32 * elem_num
bits, whereelem_num
is the number of loaded elements in the vector register.Implementation
Challenges
Due to size differences between
int
indices (32-bit) andbyte
/short
data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints.For a 512-bit SVE machine, loading a
byte
vector with different vector species require different approaches:Use
ByteVector.SPECIES_512
as an example:64 * 32
bits, which is 4 times of the SVE vector register size.Solution
The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end.
Here is the main changes:
gather_scatter_needs_vector_index()
matcher.VectorSliceNode
for result merging.VectorMaskWidenNode
for mask spliting and type conversion for masked gather-load.Testing:
Performance:
The performance of corresponding JMH benchmarks improve 3-11x on an NVIDIA GRACE CPU, which is a 128-bit SVE2 architecture. Following is the performance data:
[1] https://bugs.openjdk.org/browse/JDK-8355563
[2] https://developer.arm.com/documentation/ddi0602/2024-12/SVE-Instructions/LD1B--scalar-plus-vector-Gather-load-unsigned-bytes-to-vector--vector-index--?lang=en
[3] https://developer.arm.com/documentation/ddi0602/2024-12/SVE-Instructions/LD1H--scalar-plus-vector---Gather-load-unsigned-halfwords-to-vector--vector-index--?lang=en
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26236/head:pull/26236
$ git checkout pull/26236
Update a local copy of the PR:
$ git checkout pull/26236
$ git pull https://git.openjdk.org/jdk.git pull/26236/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26236
View PR using the GUI difftool:
$ git pr show -t 26236
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26236.diff
Using Webrev
Link to Webrev Comment