feat(pw/fft): CPE-DFTI sticks FFT as a factory-selected backend (FFT_SWDFTI)#7481
Open
A-006 wants to merge 5 commits into
Open
feat(pw/fft): CPE-DFTI sticks FFT as a factory-selected backend (FFT_SWDFTI)#7481A-006 wants to merge 5 commits into
A-006 wants to merge 5 commits into
Conversation
… -mieee
Accelerate FFT_CPU's local 1D sticks FFTs on the Sunway CPEs via the swFFT
xMath-SACA DFTI API, packaged as a SEPARATE FFT backend selected through the
FFT_Bundle factory -- FFT_CPU itself stays free of any DFTI #ifdef.
- New backend source/source_base/module_fft/fft_swdfti.{h,cpp}:
FFT_SWDFTI<double> : public FFT_CPU<double>, overriding only fftzfor/fftzbac
(batched 1D-z on CPE) and fftxyfor/fftxybac (strided 1D-x on CPE; y stays on
FFTW), plus setupFFT (builds the DFTI descriptors after the base FFTW plans).
Non-xprime / disabled cases delegate to FFT_CPU. Toggle ABACUS_NO_DFTI=1.
- FFT_Bundle factory: device "cpu" (double) instantiates FFT_SWDFTI when built
with __SWDFTI, else FFT_CPU -- the only backend-selection point.
- fft_cpu.h: members private -> protected so the subclass can reuse plans/dims.
- CMake: USE_SWDFTI option (default ON under USE_SW) compiles fft_swdfti.cpp and
defines __SWDFTI; add -mieee (CheckCXXCompilerFlag) for IEEE FP under USE_SW;
link the objcopy-isolated libswfft_xmath_iso.a (avoids the fftw_* hijack).
Guarded so OFF => byte-identical to develop (verified: fft_cpu/fft_bundle compile
clean at USE_SW=OFF, fft_swdfti excluded from baseline, cmake reconfigures clean).
Measured (4GaAs ecut60 54^3): veff_pw 1.7-1.8x, scales with np; energy bit-identical.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR adds an optional Sunway-specific CPU FFT backend that accelerates local 1D “sticks” FFTs using the swFFT xMath DFTI API, and wires it into the FFT factory and build system.
Changes:
- Introduces
FFT_SWDFTI(derived fromFFT_CPU) that offloads z/x 1D FFTs to DFTI while keeping other steps on FFTW. - Updates
FFT_Bundleto instantiateFFT_SWDFTIwhen__SWDFTIis enabled. - Extends CMake to add the new source and configure Sunway build flags and linking for the isolated swFFT archive.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| source/source_basis/module_pw/CMakeLists.txt | Conditionally compiles the new SWDFTI FFT implementation into the build. |
| source/source_base/module_fft/fft_swdfti.h | Declares the new FFT_SWDFTI backend class. |
| source/source_base/module_fft/fft_swdfti.cpp | Implements SWDFTI setup/compute paths for z/x FFTs and falls back to FFTW as needed. |
| source/source_base/module_fft/fft_cpu.h | Exposes internals to allow the SWDFTI subclass to reuse FFT_CPU plans/dimensions. |
| CMakeLists.txt | Adds Sunway compile flag checks, the USE_SWDFTI option, and changes SW linking to an isolated swFFT archive. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- cleanFFT: free the z/x DFTI descriptors before nulling the handles (previously leaked the descriptors). - setupFFT: use std::call_once for the one-time DftiInitAthread CPE spawn instead of a non-thread-safe static int guard. - CMake: link libswfft_xmath_iso.a only when USE_SWDFTI is ON, and fail fast with a clear message if the archive is missing. - fft_swdfti.h: include <complex> explicitly (no longer rely on transitive include from fft_cpu.h). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
Addressed the Copilot review (commit b780c63):
|
The x86 build is byte-identical to develop here (the SWDFTI backend is USE_SW/USE_SWDFTI-gated and not compiled in x86 CI), and develop passes 17_DS_DFTU. The failure was a marginal DeltaSpin+DFT+U energy fluctuation (6.3e-7 vs the 3e-7 threshold). Empty commit to re-run CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add a separate FFT backend
FFT_SWDFTIthat acceleratesFFT_CPU's local 1Dsticks FFTs on the Sunway CPEs via the swFFT xMath-SACA DFTI API. It is selected
through the
FFT_Bundlefactory, soFFT_CPUitself stays free of any DFTI#ifdef.How
New backend
source/source_base/module_fft/fft_swdfti.{h,cpp}:FFT_SWDFTI<double> : public FFT_CPU<double>, overriding only:fftzfor/fftzbac— batched 1D-z on CPEfftxyfor/fftxybac— strided 1D-x on CPE (y stays on FFTW)setupFFT— builds the DFTI descriptors after the base FFTW plansNon-xprime / disabled cases delegate to
FFT_CPU. Runtime toggleABACUS_NO_DFTI=1.FFT_Bundle factory: device
"cpu"(double) instantiatesFFT_SWDFTIwhen built with
__SWDFTI, elseFFT_CPU— the single backend-selection point.fft_cpu.h: membersprivate→protectedso the subclass can reuseplans/dims.
CMake:
USE_SWDFTIoption (default ON underUSE_SW) compilesfft_swdfti.cppand defines__SWDFTI; adds-mieeefor IEEE FP underUSE_SW; links the objcopy-isolatedlibswfft_xmath_iso.a(avoids thefftw_*symbol hijack).Safety / correctness
fft_cpu/fft_bundlecompile clean atUSE_SW=OFF,fft_swdftiexcluded from thebaseline, cmake reconfigures clean).
veff_pw1.7-1.8x, scales withnp; energy bit-identical.Files
source/source_base/module_fft/fft_swdfti.{h,cpp}(new)source/source_base/module_fft/fft_bundle.cppsource/source_base/module_fft/fft_cpu.hsource/source_basis/module_pw/CMakeLists.txtCMakeLists.txt