Skip to content

Add optional for LAPACK/vendor library bindings for dense linear algebra #1814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

nbeams
Copy link
Collaborator

@nbeams nbeams commented Mar 21, 2025

Starting this PR for discussion about a proposed interface to LAPACK/cuSOLVER/rocSOLVER in this branch.

The first use case is kernels for a future LOBPCG eigenvalue solver, which is why I started by adding a kernel to a newlobpcg namespace. The symmetric generalized eigenvalue problem ([he/sy]gvd) is the first addition, with Reference and Cuda support as of opening this branch, but I think we will need at least two other routines for LOBPCG. Once everything is settled with the interface, rocSOLVER support should be simple to add. Dpcpp support should be able to be added through expanding the existing oneMKL bindings.

Right now, only LAPACK with 32-bit integers will work. We can enforce this during CMake configuration, but it would require increasing the minimum CMake requirement quite a bit (to 3.22).

@nbeams nbeams added 1:ST:WIP This PR is a work in progress. Not ready for review. is:experimental This is an experimental feature/PR/issue/module. labels Mar 21, 2025
@nbeams nbeams requested review from a team March 21, 2025 18:08
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. mod:core This is related to the core module. mod:cuda This is related to the CUDA module. mod:reference This is related to the reference module. type:solver This is related to the solvers mod:hip This is related to the HIP module. labels Mar 21, 2025
@nbeams nbeams force-pushed the lapack-eig-bindings branch 2 times, most recently from b2b55a4 to a8b74c8 Compare March 21, 2025 19:46
} else {
fp_buffer_num_elems = workspace->get_size() / sizeof(ValueType);
}
array<int> dev_info(exec, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle the dev_info result in the end.

Copy link
Collaborator Author

@nbeams nbeams Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added something in 41b80cc, is this like what you were thinking?
(Edit: after a rebase, this commit is now c881ac1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally I am thinking like copy the value to host and do the same thing like lapack by checking the value.
yes, yours should be good. not sure how the others think about the another catch in the call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you also need to throw the error again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I need to throw the error again, or just add a std::exit to the catch block? I'm already printing the message of the exception (because I wanted that to print before devInfo, originally). I could also use devInfo as the error code for the exit? Or would it be better to go with something generic like -1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not print the error message / exit from the library.

Copy link
Collaborator Author

@nbeams nbeams Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried another version in 87e0d92. I'm still not thrilled with it, but I'm not sure how else to print the devInfo value from cuSOLVER both when it gives an error (e.g., reporting which parameter is invalid) and when it doesn't (failure of routine).
(Edit: after a rebase, this commit is now 83953e6)



extern "C" {
void ssygvd(const int* itype, const char* jobz, const char* uplo, const int* n,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use int64 bit version for the reference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we want to? At the moment, we wouldn't have a use case for it (sizes we would use wouldn't need int64), so it'd be simpler to just restrict to 32-bit int LAPACK.

&itype, &job, &uplo, &n, a->get_values(), &lda, b->get_values(),
&ldb, e_vals->get_data(), (ValueType*)workspace->get_data(),
&fp_buffer_num_elems, tmp_iwork.get_data(), &int_buffer_num_elems);
if (alloc == workspace_mode::allocate) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only need to check whether the workspace is enough for this operation. If not, we reallocate enough workspace for that. Currently, I think we do not handle the case that user does not let us allocate anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at the moment I only added a test for the case where we need to allocate. But in LOBPCG, we would make several calls of the same size, so we can check the workspace size once, allocate it, and re-use it without needing to call the query routine every time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add, this applies to the cuSOLVER/rocSOLVER case specifically (that we can avoid calling the query routine sometimes).

Copy link
Collaborator Author

@nbeams nbeams Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workspace_mode has been removed in 24e006d
(Edit: After a rebase, this commit is now f860ae4)

Comment on lines 33 to 35
int n = static_cast<int>(a->get_size()[1]); // column-major
int lda = static_cast<int>(a->get_stride());
int ldb = static_cast<int>(b->get_stride());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before casting to int, it needs to check whether it exceeds the int range and throw an exception if need.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering about the best "Ginkgo way" to do this. Using OverflowError? Something like this?

    constexpr auto max = std::numeric_limits<int>::max();
    if (a->get_size()[1] > max) {
        throw OverflowError(__FILE__, __LINE__,
                            name_demangling::get_type_name(typeid(int)));
    }
    if (a->get_stride() > max) {
        throw OverflowError(__FILE__, __LINE__,
                            name_demangling::get_type_name(typeid(int)));
    }
    if (b->get_stride() > max) {
        throw OverflowError(__FILE__, __LINE__,
                            name_demangling::get_type_name(typeid(int)));
    }
    int n = static_cast<int>(a->get_size()[1]);  // column-major
    int lda = static_cast<int>(a->get_stride());
    int ldb = static_cast<int>(b->get_stride());

Copy link
Collaborator Author

@nbeams nbeams Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added something like this in 41b80cc, but let me know if a different approach is preferred.
(Edit: After a rebase, this commit is now c881ac1)

namespace gko {
namespace kernels {
namespace GKO_DEVICE_NAMESPACE {
namespace lobpcg {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be another namespace not in lobpcg

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that it would be like the use of cuSPARSE, where the bindings are general, but then specific kernels will use the bindings with particular parameters set. For LOBPCG, we want to do a generalized eigenvalue problem of type 1, and we want the eigenvectors (based on the MAGMA implementation). Maybe later, a different solver will want to use the same bindings in a different way, e.g. maybe they want just the eigenvalues. So then that kernel would go in whatever namespace for its parent object.

Do you already know of another case that will use the same parameters? Then would we go ahead and create some sort of common eigensolver kernels namespace?

Comment on lines +71 to +204
std::cout << e.what() << std::endl;
int32 host_info = exec->copy_val_to_host(dev_info.get_data());
std::cout << "devInfo was " << host_info << std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that routine does not have an error but dev_info is still in unusable stage?
like compute some eigenproblem with nullspace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUSOLVER_STATUS_INVALID_VALUE give the error when dev_info < 0
but I am not sure whether it also throw the error when dev_info > 0.
if not, we need to copy the value anyway and throw LAPACK error

Copy link
Collaborator Author

@nbeams nbeams Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was assuming it would return something other than success if it encountered anything that would make devInfo be !=0, but you're right, that's not the case for devInfo > 0 (unfortunately). I tested it just to be sure.
I tried updating it again in 87e0d92, but still throwing a CusolverError...
(Edit: After a rebase, this commit is now 83953e6)
Also, it has been further updated with the addition of hipSOLVER in 5e6dc8e.


namespace gko {
namespace kernels {
namespace lobpcg {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can put this into lapack namespace not lobpcg.
The lobpcg part will be different file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, now I'm confused. I notice the core/solver/*kernels.hpp and core/multigrid/*kernels.hpp files all seem to have "their" namespace (e.g., gko, kernels, gmres for the GMRES kernels), yet the files in factorization and preconditioner just have gko and kernels, for example. What's the difference?

Comment on lines 21 to 33
/**
* Set the workspace mode for LAPACK/cuSOLVER/rocSOLVER/MKL calls
*/
enum class workspace_mode {
/**
* Query workspace size and allocate required memory
*/
allocate,
/**
* Use workspace that has already been allocated
*/
use_preallocated
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if wrong, I think the dense eigensolver workspace calculation should be some formula related to the size. Thus, the time of calculation of buffersize should be nothing. so we do not need to rely on user_preallocated means the buffersize enough already.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely true for LAPACK. I guess we can't be 100% sure that's what cuSOLVER does, but I'd imagine so. I updated it like you suggested, to always call the buffersize routine, and then just check the current size of the provided workspace.

Comment on lines 164 to 165
const double tol =
100 * std::numeric_limits<gko::remove_complex<T>>::epsilon();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have r::value for the purpose.

Copy link
Collaborator Author

@nbeams nbeams Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out -- I hadn't even noticed it (obviously, or I would've used it :) ). I changed it in 0cbb2e9
(Edit: After a rebase, this commit is now 94db682)

using Mtx = typename TestFixture::Mtx;
using Ary_r = typename TestFixture::Ary_r;
using Ary = typename TestFixture::Ary;
using T = typename TestFixture::value_type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using T = typename TestFixture::value_type;
using value_type = typename TestFixture::value_type;

nit for keeping the same logic from above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed now

}


} // namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not need to wrap the namespace in test now. apologize for the old code containing it, but new code will emit the anonymous namespace.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the anonymous namespace from both tests.

Ary_r d_small_e_vals;
};

TYPED_TEST_SUITE(Lobpcg, gko::test::ValueTypes, TypenameNameGenerator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TYPED_TEST_SUITE(Lobpcg, gko::test::ValueTypes, TypenameNameGenerator);
TYPED_TEST_SUITE(Lobpcg, gko::test::ValueTypesBase, TypenameNameGenerator);

we have the base such that you do not support on that.
same for implementation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks. That makes it a lot simpler! I changed both tests to use ValueTypesBase.

{0.3, -0.8, 2.3, -0.2},
{0.01, 0.5, -0.2, 1.9}},
ref);
small_a_cmplx = gko::initialize<Mtx_c>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest to split the complex type to different one.
We have ComplexValueTypesBase (if no half) for that.
because the interface between complex and real are the same, we test the small_a_r, small_b_r with all value types.
and small_a_cmplx, small_b_complx with ComplexValueType

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "different one"? I don't think I understand.

@nbeams
Copy link
Collaborator Author

nbeams commented Apr 28, 2025

(Note: 5217d60 is actually part of 3c600f1, but I missed checking in the new file. Can be cleaned up when doing a future rebase 🙂 )
Edit: This cleanup has been done, and the new commit is afd6e00.

@nbeams nbeams force-pushed the lapack-eig-bindings branch from 5217d60 to 5e6dc8e Compare May 6, 2025 20:22
@ginkgo-bot
Copy link
Member

Error: The following files need to be formatted:

accessor/cuda_helper.hpp
accessor/hip_helper.hpp
accessor/reduced_row_major_reference.hpp
accessor/reference_helper.hpp
accessor/sycl_helper.hpp
common/cuda_hip/base/math.hpp
common/cuda_hip/base/types.hpp
common/cuda_hip/components/atomic.hpp
common/cuda_hip/factorization/par_ic_kernels.cpp
common/cuda_hip/factorization/par_ict_kernels.cpp
common/cuda_hip/factorization/par_ilu_kernels.cpp
common/cuda_hip/factorization/par_ilut_select_common.cpp
common/cuda_hip/factorization/par_ilut_select_kernels.cpp
common/cuda_hip/factorization/par_ilut_sweep_kernels.cpp
common/cuda_hip/matrix/coo_kernels.cpp
common/cuda_hip/preconditioner/batch_jacobi_kernels.hpp
common/cuda_hip/preconditioner/isai_kernels.cpp
common/cuda_hip/solver/batch_bicgstab_kernels.hpp
common/cuda_hip/solver/batch_cg_kernels.hpp
common/cuda_hip/solver/idr_kernels.cpp
core/base/batch_multi_vector.cpp
core/base/block_operator.cpp
core/base/mixed_precision_types.hpp
core/base/mtx_io.cpp
core/config/dispatch.hpp
core/config/type_descriptor_helper.hpp
core/log/solver_progress.cpp
core/matrix/batch_csr.cpp
core/matrix/batch_dense.cpp
core/matrix/batch_ell.cpp
core/matrix/dense.cpp
core/matrix/diagonal.cpp
core/matrix/ell.cpp
core/matrix/fbcsr.cpp
core/matrix/hybrid.cpp
core/matrix/permutation.cpp
core/matrix/row_gatherer.cpp
core/matrix/sellp.cpp
core/test/base/extended_float.cpp
core/test/base/floating_bit_helper.hpp
core/test/base/half.cpp
core/test/base/mtx_io.cpp
core/test/utils/matrix_generator.hpp
core/test/utils/matrix_generator_test.cpp
core/test/utils/matrix_utils_test.cpp
cuda/base/types.hpp
cuda/solver/common_trs_kernels.cuh
cuda/test/base/math.cu
dpcpp/base/complex.hpp
dpcpp/base/math.hpp
dpcpp/base/types.hpp
dpcpp/components/reduction.dp.hpp
dpcpp/factorization/par_ilut_select_common.dp.cpp
dpcpp/factorization/par_ilut_select_kernels.hpp.inc
dpcpp/matrix/coo_kernels.dp.cpp
dpcpp/preconditioner/batch_block_jacobi.hpp
dpcpp/solver/batch_bicgstab_kernels.dp.cpp
dpcpp/solver/batch_bicgstab_kernels.hpp
dpcpp/solver/batch_cg_kernels.hpp
dpcpp/solver/idr_kernels.dp.cpp
hip/base/types.hip.hpp
hip/components/cooperative_groups.hip.hpp
hip/test/base/math.hip.cpp
hip/test/matrix/fbcsr_kernels.cpp
include/ginkgo/core/base/batch_multi_vector.hpp
include/ginkgo/core/base/math.hpp
include/ginkgo/core/log/logger.hpp
include/ginkgo/core/matrix/batch_csr.hpp
include/ginkgo/core/matrix/batch_dense.hpp
include/ginkgo/core/matrix/batch_ell.hpp
include/ginkgo/core/matrix/diagonal.hpp
include/ginkgo/core/matrix/ell.hpp
include/ginkgo/core/matrix/fbcsr.hpp
include/ginkgo/core/matrix/hybrid.hpp
include/ginkgo/core/matrix/sellp.hpp
omp/components/atomic.hpp
reference/solver/batch_bicgstab_kernels.hpp
reference/solver/batch_cg_kernels.hpp
reference/test/factorization/par_ilut_kernels.cpp
reference/test/preconditioner/ilu.cpp
reference/test/preconditioner/isai_kernels.cpp
reference/test/reorder/mc64_kernels.cpp
reference/test/solver/batch_bicgstab_kernels.cpp
reference/test/solver/batch_cg_kernels.cpp
reference/test/solver/cgs_kernels.cpp
test/components/reduce_array_kernels.cpp
test/factorization/par_ict_kernels.cpp
test/factorization/par_ilut_kernels.cpp

You can find a formatting patch under Artifacts here or run format! if you have write access to Ginkgo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:WIP This PR is a work in progress. Not ready for review. is:experimental This is an experimental feature/PR/issue/module. mod:core This is related to the core module. mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. mod:reference This is related to the reference module. reg:build This is related to the build system. reg:testing This is related to testing. type:solver This is related to the solvers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants