-
Notifications
You must be signed in to change notification settings - Fork 96
Add optional for LAPACK/vendor library bindings for dense linear algebra #1814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
b2b55a4
to
a8b74c8
Compare
} else { | ||
fp_buffer_num_elems = workspace->get_size() / sizeof(ValueType); | ||
} | ||
array<int> dev_info(exec, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle the dev_info result in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
originally I am thinking like copy the value to host and do the same thing like lapack by checking the value.
yes, yours should be good. not sure how the others think about the another catch in the call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you also need to throw the error again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I need to throw the error again, or just add a std::exit
to the catch block? I'm already printing the message of the exception (because I wanted that to print before devInfo
, originally). I could also use devInfo
as the error code for the exit? Or would it be better to go with something generic like -1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not print the error message / exit from the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried another version in 87e0d92. I'm still not thrilled with it, but I'm not sure how else to print the devInfo
value from cuSOLVER both when it gives an error (e.g., reporting which parameter is invalid) and when it doesn't (failure of routine).
(Edit: after a rebase, this commit is now 83953e6)
reference/base/lapack_bindings.hpp
Outdated
|
||
|
||
extern "C" { | ||
void ssygvd(const int* itype, const char* jobz, const char* uplo, const int* n, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use int64 bit version for the reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we want to? At the moment, we wouldn't have a use case for it (sizes we would use wouldn't need int64), so it'd be simpler to just restrict to 32-bit int LAPACK.
reference/solver/lobpcg_kernels.cpp
Outdated
&itype, &job, &uplo, &n, a->get_values(), &lda, b->get_values(), | ||
&ldb, e_vals->get_data(), (ValueType*)workspace->get_data(), | ||
&fp_buffer_num_elems, tmp_iwork.get_data(), &int_buffer_num_elems); | ||
if (alloc == workspace_mode::allocate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only need to check whether the workspace is enough for this operation. If not, we reallocate enough workspace for that. Currently, I think we do not handle the case that user does not let us allocate anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at the moment I only added a test for the case where we need to allocate. But in LOBPCG, we would make several calls of the same size, so we can check the workspace size once, allocate it, and re-use it without needing to call the query routine every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add, this applies to the cuSOLVER/rocSOLVER case specifically (that we can avoid calling the query routine sometimes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int n = static_cast<int>(a->get_size()[1]); // column-major | ||
int lda = static_cast<int>(a->get_stride()); | ||
int ldb = static_cast<int>(b->get_stride()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before casting to int, it needs to check whether it exceeds the int range and throw an exception if need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering about the best "Ginkgo way" to do this. Using OverflowError
? Something like this?
constexpr auto max = std::numeric_limits<int>::max();
if (a->get_size()[1] > max) {
throw OverflowError(__FILE__, __LINE__,
name_demangling::get_type_name(typeid(int)));
}
if (a->get_stride() > max) {
throw OverflowError(__FILE__, __LINE__,
name_demangling::get_type_name(typeid(int)));
}
if (b->get_stride() > max) {
throw OverflowError(__FILE__, __LINE__,
name_demangling::get_type_name(typeid(int)));
}
int n = static_cast<int>(a->get_size()[1]); // column-major
int lda = static_cast<int>(a->get_stride());
int ldb = static_cast<int>(b->get_stride());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace gko { | ||
namespace kernels { | ||
namespace GKO_DEVICE_NAMESPACE { | ||
namespace lobpcg { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be another namespace not in lobpcg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that it would be like the use of cuSPARSE, where the bindings are general, but then specific kernels will use the bindings with particular parameters set. For LOBPCG, we want to do a generalized eigenvalue problem of type 1, and we want the eigenvectors (based on the MAGMA implementation). Maybe later, a different solver will want to use the same bindings in a different way, e.g. maybe they want just the eigenvalues. So then that kernel would go in whatever namespace for its parent object.
Do you already know of another case that will use the same parameters? Then would we go ahead and create some sort of common eigensolver kernels namespace?
std::cout << e.what() << std::endl; | ||
int32 host_info = exec->copy_val_to_host(dev_info.get_data()); | ||
std::cout << "devInfo was " << host_info << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that routine does not have an error but dev_info is still in unusable stage?
like compute some eigenproblem with nullspace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUSOLVER_STATUS_INVALID_VALUE
give the error when dev_info < 0
but I am not sure whether it also throw the error when dev_info > 0
.
if not, we need to copy the value anyway and throw LAPACK error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was assuming it would return something other than success
if it encountered anything that would make devInfo
be !=0
, but you're right, that's not the case for devInfo > 0
(unfortunately). I tested it just to be sure.
I tried updating it again in 87e0d92, but still throwing a CusolverError...
(Edit: After a rebase, this commit is now 83953e6)
Also, it has been further updated with the addition of hipSOLVER in 5e6dc8e.
|
||
namespace gko { | ||
namespace kernels { | ||
namespace lobpcg { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can put this into lapack namespace not lobpcg.
The lobpcg part will be different file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, now I'm confused. I notice the core/solver/*kernels.hpp
and core/multigrid/*kernels.hpp
files all seem to have "their" namespace (e.g., gko
, kernels
, gmres
for the GMRES kernels), yet the files in factorization
and preconditioner
just have gko
and kernels
, for example. What's the difference?
core/eigensolver/lobpcg_kernels.hpp
Outdated
/** | ||
* Set the workspace mode for LAPACK/cuSOLVER/rocSOLVER/MKL calls | ||
*/ | ||
enum class workspace_mode { | ||
/** | ||
* Query workspace size and allocate required memory | ||
*/ | ||
allocate, | ||
/** | ||
* Use workspace that has already been allocated | ||
*/ | ||
use_preallocated | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if wrong, I think the dense eigensolver workspace calculation should be some formula related to the size. Thus, the time of calculation of buffersize should be nothing. so we do not need to rely on user_preallocated
means the buffersize enough already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's definitely true for LAPACK. I guess we can't be 100% sure that's what cuSOLVER does, but I'd imagine so. I updated it like you suggested, to always call the buffersize routine, and then just check the current size of the provided workspace.
const double tol = | ||
100 * std::numeric_limits<gko::remove_complex<T>>::epsilon(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have r::value for the purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using Mtx = typename TestFixture::Mtx; | ||
using Ary_r = typename TestFixture::Ary_r; | ||
using Ary = typename TestFixture::Ary; | ||
using T = typename TestFixture::value_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using T = typename TestFixture::value_type; | |
using value_type = typename TestFixture::value_type; |
nit for keeping the same logic from above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed now
} | ||
|
||
|
||
} // namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not need to wrap the namespace in test now. apologize for the old code containing it, but new code will emit the anonymous namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the anonymous namespace from both tests.
test/eigensolver/lobpcg_kernels.cpp
Outdated
Ary_r d_small_e_vals; | ||
}; | ||
|
||
TYPED_TEST_SUITE(Lobpcg, gko::test::ValueTypes, TypenameNameGenerator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TYPED_TEST_SUITE(Lobpcg, gko::test::ValueTypes, TypenameNameGenerator); | |
TYPED_TEST_SUITE(Lobpcg, gko::test::ValueTypesBase, TypenameNameGenerator); |
we have the base such that you do not support on that.
same for implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks. That makes it a lot simpler! I changed both tests to use ValueTypesBase
.
{0.3, -0.8, 2.3, -0.2}, | ||
{0.01, 0.5, -0.2, 1.9}}, | ||
ref); | ||
small_a_cmplx = gko::initialize<Mtx_c>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to split the complex type to different one.
We have ComplexValueTypesBase (if no half) for that.
because the interface between complex and real are the same, we test the small_a_r, small_b_r with all value types.
and small_a_cmplx, small_b_complx with ComplexValueType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "different one"? I don't think I understand.
…ser in the case of vendor routine failure
…verError::get_error when building without LAPACK
… of returned status
5217d60
to
5e6dc8e
Compare
Error: The following files need to be formatted:
You can find a formatting patch under Artifacts here or run |
Starting this PR for discussion about a proposed interface to LAPACK/cuSOLVER/rocSOLVER in this branch.
The first use case is kernels for a future LOBPCG eigenvalue solver, which is why I started by adding a kernel to a new
lobpcg
namespace. The symmetric generalized eigenvalue problem ([he/sy]gvd) is the first addition, with Reference and Cuda support as of opening this branch, but I think we will need at least two other routines for LOBPCG. Once everything is settled with the interface, rocSOLVER support should be simple to add. Dpcpp support should be able to be added through expanding the existing oneMKL bindings.Right now, only LAPACK with 32-bit integers will work. We can enforce this during CMake configuration, but it would require increasing the minimum CMake requirement quite a bit (to 3.22).