Add optional for LAPACK/vendor library bindings for dense linear algebra #1814

nbeams · 2025-03-21T18:08:31Z

Starting this PR for discussion about a proposed interface to LAPACK/cuSOLVER/rocSOLVER in this branch.

The first use case is kernels for a future LOBPCG eigenvalue solver, which is why I started by adding a kernel to a newlobpcg namespace. The symmetric generalized eigenvalue problem ([he/sy]gvd) is the first addition, with Reference and Cuda support as of opening this branch, but I think we will need at least two other routines for LOBPCG. Once everything is settled with the interface, rocSOLVER support should be simple to add. Dpcpp support should be able to be added through expanding the existing oneMKL bindings.

Right now, only LAPACK with 32-bit integers will work. We can enforce this during CMake configuration, but it would require increasing the minimum CMake requirement quite a bit (to 3.22).

common/cuda_hip/base/dev_lapack_bindings.hpp

common/cuda_hip/solver/lobpcg_kernels.cpp

yhmtsai · 2025-03-24T09:11:52Z

common/cuda_hip/solver/lobpcg_kernels.cpp

+    } else {
+        fp_buffer_num_elems = workspace->get_size() / sizeof(ValueType);
+    }
+    array<int> dev_info(exec, 1);


handle the dev_info result in the end.

I added something in 41b80cc, is this like what you were thinking?
(Edit: after a rebase, this commit is now c881ac1)

originally I am thinking like copy the value to host and do the same thing like lapack by checking the value.
yes, yours should be good. not sure how the others think about the another catch in the call.

you also need to throw the error again.

Do I need to throw the error again, or just add a std::exit to the catch block? I'm already printing the message of the exception (because I wanted that to print before devInfo, originally). I could also use devInfo as the error code for the exit? Or would it be better to go with something generic like -1?

We do not print the error message / exit from the library.

I tried another version in 87e0d92. I'm still not thrilled with it, but I'm not sure how else to print the devInfo value from cuSOLVER both when it gives an error (e.g., reporting which parameter is invalid) and when it doesn't (failure of routine).
(Edit: after a rebase, this commit is now 83953e6)

common/cuda_hip/solver/lobpcg_kernels.cpp

core/test/base/exception.cpp

yhmtsai · 2025-03-24T09:32:04Z

reference/base/lapack_bindings.hpp

+
+
+extern "C" {
+void ssygvd(const int* itype, const char* jobz, const char* uplo, const int* n,


we can use int64 bit version for the reference.

But do we want to? At the moment, we wouldn't have a use case for it (sizes we would use wouldn't need int64), so it'd be simpler to just restrict to 32-bit int LAPACK.

yhmtsai · 2025-03-24T09:34:32Z

reference/solver/lobpcg_kernels.cpp

+            &itype, &job, &uplo, &n, a->get_values(), &lda, b->get_values(),
+            &ldb, e_vals->get_data(), (ValueType*)workspace->get_data(),
+            &fp_buffer_num_elems, tmp_iwork.get_data(), &int_buffer_num_elems);
+        if (alloc == workspace_mode::allocate) {


we only need to check whether the workspace is enough for this operation. If not, we reallocate enough workspace for that. Currently, I think we do not handle the case that user does not let us allocate anything.

Yes, at the moment I only added a test for the case where we need to allocate. But in LOBPCG, we would make several calls of the same size, so we can check the workspace size once, allocate it, and re-use it without needing to call the query routine every time.

I should add, this applies to the cuSOLVER/rocSOLVER case specifically (that we can avoid calling the query routine sometimes).

The workspace_mode has been removed in 24e006d
(Edit: After a rebase, this commit is now f860ae4)

reference/solver/lobpcg_kernels.cpp

core/solver/lobpcg_kernels.hpp

yhmtsai · 2025-03-24T09:36:59Z

common/cuda_hip/solver/lobpcg_kernels.cpp

+    int n = static_cast<int>(a->get_size()[1]);  // column-major
+    int lda = static_cast<int>(a->get_stride());
+    int ldb = static_cast<int>(b->get_stride());


before casting to int, it needs to check whether it exceeds the int range and throw an exception if need.

I am wondering about the best "Ginkgo way" to do this. Using OverflowError? Something like this?

constexpr auto max = std::numeric_limits<int>::max(); if (a->get_size()[1] > max) { throw OverflowError(__FILE__, __LINE__, name_demangling::get_type_name(typeid(int))); } if (a->get_stride() > max) { throw OverflowError(__FILE__, __LINE__, name_demangling::get_type_name(typeid(int))); } if (b->get_stride() > max) { throw OverflowError(__FILE__, __LINE__, name_demangling::get_type_name(typeid(int))); } int n = static_cast<int>(a->get_size()[1]); // column-major int lda = static_cast<int>(a->get_stride()); int ldb = static_cast<int>(b->get_stride());

I added something like this in 41b80cc, but let me know if a different approach is preferred.
(Edit: After a rebase, this commit is now c881ac1)

yhmtsai · 2025-04-08T15:44:50Z

common/cuda_hip/eigensolver/lobpcg_kernels.cpp

+namespace gko {
+namespace kernels {
+namespace GKO_DEVICE_NAMESPACE {
+namespace lobpcg {


I think it can be another namespace not in lobpcg

My thought was that it would be like the use of cuSPARSE, where the bindings are general, but then specific kernels will use the bindings with particular parameters set. For LOBPCG, we want to do a generalized eigenvalue problem of type 1, and we want the eigenvectors (based on the MAGMA implementation). Maybe later, a different solver will want to use the same bindings in a different way, e.g. maybe they want just the eigenvalues. So then that kernel would go in whatever namespace for its parent object.

Do you already know of another case that will use the same parameters? Then would we go ahead and create some sort of common eigensolver kernels namespace?

common/cuda_hip/eigensolver/lobpcg_kernels.cpp

yhmtsai · 2025-04-08T15:47:51Z

common/cuda_hip/eigensolver/lobpcg_kernels.cpp

+        std::cout << e.what() << std::endl;
+        int32 host_info = exec->copy_val_to_host(dev_info.get_data());
+        std::cout << "devInfo was " << host_info << std::endl;


Is it possible that routine does not have an error but dev_info is still in unusable stage?
like compute some eigenproblem with nullspace.

CUSOLVER_STATUS_INVALID_VALUE give the error when dev_info < 0
but I am not sure whether it also throw the error when dev_info > 0.
if not, we need to copy the value anyway and throw LAPACK error

Yes, I was assuming it would return something other than success if it encountered anything that would make devInfo be !=0, but you're right, that's not the case for devInfo > 0 (unfortunately). I tested it just to be sure.
I tried updating it again in 87e0d92, but still throwing a CusolverError...
(Edit: After a rebase, this commit is now 83953e6)
Also, it has been further updated with the addition of hipSOLVER in 5e6dc8e.

yhmtsai · 2025-04-08T15:51:31Z

core/eigensolver/lobpcg_kernels.hpp

+
+namespace gko {
+namespace kernels {
+namespace lobpcg {


I think we can put this into lapack namespace not lobpcg.
The lobpcg part will be different file

Hmm, now I'm confused. I notice the core/solver/*kernels.hpp and core/multigrid/*kernels.hpp files all seem to have "their" namespace (e.g., gko, kernels, gmres for the GMRES kernels), yet the files in factorization and preconditioner just have gko and kernels, for example. What's the difference?

yhmtsai · 2025-04-08T15:54:37Z

core/eigensolver/lobpcg_kernels.hpp

+/**
+ * Set the workspace mode for LAPACK/cuSOLVER/rocSOLVER/MKL calls
+ */
+enum class workspace_mode {
+    /**
+     * Query workspace size and allocate required memory
+     */
+    allocate,
+    /**
+     * Use workspace that has already been allocated
+     */
+    use_preallocated
+};


Correct me if wrong, I think the dense eigensolver workspace calculation should be some formula related to the size. Thus, the time of calculation of buffersize should be nothing. so we do not need to rely on user_preallocated means the buffersize enough already.

That's definitely true for LAPACK. I guess we can't be 100% sure that's what cuSOLVER does, but I'd imagine so. I updated it like you suggested, to always call the buffersize routine, and then just check the current size of the provided workspace.

yhmtsai · 2025-04-08T16:27:03Z

reference/test/eigensolver/lobpcg_kernels.cpp

+        const double tol =
+            100 * std::numeric_limits<gko::remove_complex<T>>::epsilon();


we have r::value for the purpose.

Thanks for pointing this out -- I hadn't even noticed it (obviously, or I would've used it :) ). I changed it in 0cbb2e9
(Edit: After a rebase, this commit is now 94db682)

yhmtsai · 2025-04-08T16:27:36Z

reference/test/eigensolver/lobpcg_kernels.cpp

+    using Mtx = typename TestFixture::Mtx;
+    using Ary_r = typename TestFixture::Ary_r;
+    using Ary = typename TestFixture::Ary;
+    using T = typename TestFixture::value_type;


Suggested change

using T = typename TestFixture::value_type;

using value_type = typename TestFixture::value_type;

nit for keeping the same logic from above

Good catch. Fixed now

yhmtsai · 2025-04-08T16:28:47Z

reference/test/eigensolver/lobpcg_kernels.cpp

+}
+
+
+}  // namespace


we do not need to wrap the namespace in test now. apologize for the old code containing it, but new code will emit the anonymous namespace.

I removed the anonymous namespace from both tests.

yhmtsai · 2025-04-08T16:31:35Z

test/eigensolver/lobpcg_kernels.cpp

+    Ary_r d_small_e_vals;
+};
+
+TYPED_TEST_SUITE(Lobpcg, gko::test::ValueTypes, TypenameNameGenerator);


Suggested change

TYPED_TEST_SUITE(Lobpcg, gko::test::ValueTypes, TypenameNameGenerator);

TYPED_TEST_SUITE(Lobpcg, gko::test::ValueTypesBase, TypenameNameGenerator);

we have the base such that you do not support on that.
same for implementation

Ah, thanks. That makes it a lot simpler! I changed both tests to use ValueTypesBase.

yhmtsai · 2025-04-08T16:33:56Z

test/eigensolver/lobpcg_kernels.cpp

+                                            {0.3, -0.8, 2.3, -0.2},
+                                            {0.01, 0.5, -0.2, 1.9}},
+                                           ref);
+        small_a_cmplx = gko::initialize<Mtx_c>(


suggest to split the complex type to different one.
We have ComplexValueTypesBase (if no half) for that.
because the interface between complex and real are the same, we test the small_a_r, small_b_r with all value types.
and small_a_cmplx, small_b_complx with ComplexValueType

What do you mean by "different one"? I don't think I understand.

nbeams · 2025-04-28T22:41:41Z

(Note: 5217d60 is actually part of 3c600f1, but I missed checking in the new file. Can be cleaned up when doing a future rebase 🙂 )
Edit: This cleanup has been done, and the new commit is afd6e00.

…ser in the case of vendor routine failure

…verError::get_error when building without LAPACK

…mespace

… of returned status

ginkgo-bot · 2025-05-06T20:22:53Z

Error: The following files need to be formatted:

accessor/cuda_helper.hpp
accessor/hip_helper.hpp
accessor/reduced_row_major_reference.hpp
accessor/reference_helper.hpp
accessor/sycl_helper.hpp
common/cuda_hip/base/math.hpp
common/cuda_hip/base/types.hpp
common/cuda_hip/components/atomic.hpp
common/cuda_hip/factorization/par_ic_kernels.cpp
common/cuda_hip/factorization/par_ict_kernels.cpp
common/cuda_hip/factorization/par_ilu_kernels.cpp
common/cuda_hip/factorization/par_ilut_select_common.cpp
common/cuda_hip/factorization/par_ilut_select_kernels.cpp
common/cuda_hip/factorization/par_ilut_sweep_kernels.cpp
common/cuda_hip/matrix/coo_kernels.cpp
common/cuda_hip/preconditioner/batch_jacobi_kernels.hpp
common/cuda_hip/preconditioner/isai_kernels.cpp
common/cuda_hip/solver/batch_bicgstab_kernels.hpp
common/cuda_hip/solver/batch_cg_kernels.hpp
common/cuda_hip/solver/idr_kernels.cpp
core/base/batch_multi_vector.cpp
core/base/block_operator.cpp
core/base/mixed_precision_types.hpp
core/base/mtx_io.cpp
core/config/dispatch.hpp
core/config/type_descriptor_helper.hpp
core/log/solver_progress.cpp
core/matrix/batch_csr.cpp
core/matrix/batch_dense.cpp
core/matrix/batch_ell.cpp
core/matrix/dense.cpp
core/matrix/diagonal.cpp
core/matrix/ell.cpp
core/matrix/fbcsr.cpp
core/matrix/hybrid.cpp
core/matrix/permutation.cpp
core/matrix/row_gatherer.cpp
core/matrix/sellp.cpp
core/test/base/extended_float.cpp
core/test/base/floating_bit_helper.hpp
core/test/base/half.cpp
core/test/base/mtx_io.cpp
core/test/utils/matrix_generator.hpp
core/test/utils/matrix_generator_test.cpp
core/test/utils/matrix_utils_test.cpp
cuda/base/types.hpp
cuda/solver/common_trs_kernels.cuh
cuda/test/base/math.cu
dpcpp/base/complex.hpp
dpcpp/base/math.hpp
dpcpp/base/types.hpp
dpcpp/components/reduction.dp.hpp
dpcpp/factorization/par_ilut_select_common.dp.cpp
dpcpp/factorization/par_ilut_select_kernels.hpp.inc
dpcpp/matrix/coo_kernels.dp.cpp
dpcpp/preconditioner/batch_block_jacobi.hpp
dpcpp/solver/batch_bicgstab_kernels.dp.cpp
dpcpp/solver/batch_bicgstab_kernels.hpp
dpcpp/solver/batch_cg_kernels.hpp
dpcpp/solver/idr_kernels.dp.cpp
hip/base/types.hip.hpp
hip/components/cooperative_groups.hip.hpp
hip/test/base/math.hip.cpp
hip/test/matrix/fbcsr_kernels.cpp
include/ginkgo/core/base/batch_multi_vector.hpp
include/ginkgo/core/base/math.hpp
include/ginkgo/core/log/logger.hpp
include/ginkgo/core/matrix/batch_csr.hpp
include/ginkgo/core/matrix/batch_dense.hpp
include/ginkgo/core/matrix/batch_ell.hpp
include/ginkgo/core/matrix/diagonal.hpp
include/ginkgo/core/matrix/ell.hpp
include/ginkgo/core/matrix/fbcsr.hpp
include/ginkgo/core/matrix/hybrid.hpp
include/ginkgo/core/matrix/sellp.hpp
omp/components/atomic.hpp
reference/solver/batch_bicgstab_kernels.hpp
reference/solver/batch_cg_kernels.hpp
reference/test/factorization/par_ilut_kernels.cpp
reference/test/preconditioner/ilu.cpp
reference/test/preconditioner/isai_kernels.cpp
reference/test/reorder/mc64_kernels.cpp
reference/test/solver/batch_bicgstab_kernels.cpp
reference/test/solver/batch_cg_kernels.cpp
reference/test/solver/cgs_kernels.cpp
test/components/reduce_array_kernels.cpp
test/factorization/par_ict_kernels.cpp
test/factorization/par_ilut_kernels.cpp

You can find a formatting patch under Artifacts here or run format! if you have write access to Ginkgo

nbeams added 1:ST:WIP This PR is a work in progress. Not ready for review. is:experimental This is an experimental feature/PR/issue/module. labels Mar 21, 2025

nbeams requested review from a team March 21, 2025 18:08

nbeams force-pushed the lapack-eig-bindings branch 2 times, most recently from b2b55a4 to a8b74c8 Compare March 21, 2025 19:46

yhmtsai requested changes Mar 24, 2025

View reviewed changes

MarcelKoch assigned nbeams Mar 24, 2025

yhmtsai reviewed Apr 8, 2025

View reviewed changes

nbeams added 13 commits May 2, 2025 20:12

Add option for LAPACK/cuSOLVER bindings for [he,sy]gvd

bcdd15d

cuda/hip sygvd: check for int overflow and provide devInfo value to u…

c881ac1

…ser in the case of vendor routine failure

Address some review comments: style changes, add definition for Cusol…

30b1dac

…verError::get_error when building without LAPACK

Use int32 instead of int

b048190

Add new eigensolver subdirectory in preparation for an eigensolver na…

1433f71

…mespace

Require CMake 3.18 if building with LAPACK

56db294

cusolver symm generalized eigensolver: check devInfo value regardless…

83953e6

… of returned status

Remove workspace_mode from lobpcg kernels

f860ae4

lobpcg kernels test improvements from review comments

94db682

minor: spacing

755eace

Expand LAPACK/cuSOLVER bindings to [he,sy]evd

204710f

Add B-orthonormalization kernel using LAPACK/cuSOLVER Cholesky

afd6e00

Change dense eigenvalue kernels to expect row-major data

22ebb3b

nbeams added 2 commits May 6, 2025 20:21

Add complex types for hipBLAS GEAM

dc89057

Add hipsolver bindings and enable lobpcg kernel tests for HIP

5e6dc8e

nbeams force-pushed the lapack-eig-bindings branch from 5217d60 to 5e6dc8e Compare May 6, 2025 20:22

nbeams mentioned this pull request May 7, 2025

Add complex type support for hipBLAS GEAM #1839

Merged



		extern "C" {
		void ssygvd(const int* itype, const char* jobz, const char* uplo, const int* n,

		const double tol =
		100 * std::numeric_limits<gko::remove_complex<T>>::epsilon();

	using T = typename TestFixture::value_type;
	using value_type = typename TestFixture::value_type;

	TYPED_TEST_SUITE(Lobpcg, gko::test::ValueTypes, TypenameNameGenerator);
	TYPED_TEST_SUITE(Lobpcg, gko::test::ValueTypesBase, TypenameNameGenerator);

Add optional for LAPACK/vendor library bindings for dense linear algebra #1814

Are you sure you want to change the base?

Add optional for LAPACK/vendor library bindings for dense linear algebra #1814

Uh oh!

Conversation

nbeams commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nbeams Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nbeams Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nbeams Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nbeams Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nbeams Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nbeams Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nbeams Mar 26, 2025 •

edited

Loading

nbeams Apr 9, 2025 •

edited

Loading

nbeams Apr 9, 2025 •

edited

Loading

nbeams Mar 26, 2025 •

edited

Loading

nbeams Apr 9, 2025 •

edited

Loading

nbeams Apr 9, 2025 •

edited

Loading

nbeams commented Apr 28, 2025 •

edited

Loading