-
Notifications
You must be signed in to change notification settings - Fork 471
Implement S/C/D/Z AXPBY #1048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Implement S/C/D/Z AXPBY #1048
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some changes related to TFSM (both in SRC and LAPACKE). I do not think this should be part of this commit.
please rebase, it looks like your fork is about two weeks out of date... |
7606aa1
to
4b9693f
Compare
Rebased to current master's state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you taking on the task of adding another longstanding BLAS extension to reference-BLAS. I really appreciate that.
*> | ||
*> \verbatim | ||
*> | ||
*> CAXPBY constant times a vector plus constanttimes a vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: constanttimes -> constant times
*> | ||
*> \verbatim | ||
*> | ||
*> jack dongarra, linpack, 3/11/78. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy & paste?
*> | ||
* ===================================================================== | ||
SUBROUTINE CAXPBY(N,CA,CX,INCX,CB,CY,INCY) | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add "implicit none"?
* | ||
* clean-up loop | ||
* | ||
M = MOD(N,4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manual unrolling feels a tad out of time. I would just do the scalar loop. Note that [c,z]axpby do not have the unrolled loop.
Description
This PR implements the AXPBY operation
which extends the
axpy
operation by the second scaling factor, just like ingemm
orgemv
.This is required to reduce the memory transfers in algorithms like the CG algorithm, where one step is
Until now, this needs to be implemented in one
scal
and oneaxpy
step. The introduction of theaxpby
routine allows to read and writep_{k+1}
only once from the memory. In other iterative algorithms, like BiCGStab, the subroutine can be used as well.The routine already exists, for example, in
Checklist