Skip to content

feat(pw/fft): CPE-DFTI sticks FFT as a factory-selected backend (FFT_SWDFTI)#7481

Open
A-006 wants to merge 5 commits into
deepmodeling:developfrom
A-006:feat/swfft-dfti
Open

feat(pw/fft): CPE-DFTI sticks FFT as a factory-selected backend (FFT_SWDFTI)#7481
A-006 wants to merge 5 commits into
deepmodeling:developfrom
A-006:feat/swfft-dfti

Conversation

@A-006

@A-006 A-006 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

What

Add a separate FFT backend FFT_SWDFTI that accelerates FFT_CPU's local 1D
sticks FFTs on the Sunway CPEs via the swFFT xMath-SACA DFTI API. It is selected
through the FFT_Bundle factory, so FFT_CPU itself stays free of any DFTI
#ifdef.

How

  • New backend source/source_base/module_fft/fft_swdfti.{h,cpp}:
    FFT_SWDFTI<double> : public FFT_CPU<double>, overriding only:

    • fftzfor/fftzbac — batched 1D-z on CPE
    • fftxyfor/fftxybac — strided 1D-x on CPE (y stays on FFTW)
    • setupFFT — builds the DFTI descriptors after the base FFTW plans

    Non-xprime / disabled cases delegate to FFT_CPU. Runtime toggle
    ABACUS_NO_DFTI=1.

  • FFT_Bundle factory: device "cpu" (double) instantiates FFT_SWDFTI
    when built with __SWDFTI, else FFT_CPU — the single backend-selection point.

  • fft_cpu.h: members privateprotected so the subclass can reuse
    plans/dims.

  • CMake: USE_SWDFTI option (default ON under USE_SW) compiles
    fft_swdfti.cpp and defines __SWDFTI; adds -mieee for IEEE FP under
    USE_SW; links the objcopy-isolated libswfft_xmath_iso.a (avoids the
    fftw_* symbol hijack).

Safety / correctness

  • Guarded so that OFF => byte-identical to develop (verified: fft_cpu /
    fft_bundle compile clean at USE_SW=OFF, fft_swdfti excluded from the
    baseline, cmake reconfigures clean).
  • Measured (4xGaAs, ecut 60, 54^3 grid): veff_pw 1.7-1.8x, scales with
    np; energy bit-identical.

Files

  • source/source_base/module_fft/fft_swdfti.{h,cpp} (new)
  • source/source_base/module_fft/fft_bundle.cpp
  • source/source_base/module_fft/fft_cpu.h
  • source/source_basis/module_pw/CMakeLists.txt
  • CMakeLists.txt

A-006 and others added 2 commits June 2, 2026 15:53
… -mieee

Accelerate FFT_CPU's local 1D sticks FFTs on the Sunway CPEs via the swFFT
xMath-SACA DFTI API, packaged as a SEPARATE FFT backend selected through the
FFT_Bundle factory -- FFT_CPU itself stays free of any DFTI #ifdef.

- New backend source/source_base/module_fft/fft_swdfti.{h,cpp}:
  FFT_SWDFTI<double> : public FFT_CPU<double>, overriding only fftzfor/fftzbac
  (batched 1D-z on CPE) and fftxyfor/fftxybac (strided 1D-x on CPE; y stays on
  FFTW), plus setupFFT (builds the DFTI descriptors after the base FFTW plans).
  Non-xprime / disabled cases delegate to FFT_CPU. Toggle ABACUS_NO_DFTI=1.
- FFT_Bundle factory: device "cpu" (double) instantiates FFT_SWDFTI when built
  with __SWDFTI, else FFT_CPU -- the only backend-selection point.
- fft_cpu.h: members private -> protected so the subclass can reuse plans/dims.
- CMake: USE_SWDFTI option (default ON under USE_SW) compiles fft_swdfti.cpp and
  defines __SWDFTI; add -mieee (CheckCXXCompilerFlag) for IEEE FP under USE_SW;
  link the objcopy-isolated libswfft_xmath_iso.a (avoids the fftw_* hijack).

Guarded so OFF => byte-identical to develop (verified: fft_cpu/fft_bundle compile
clean at USE_SW=OFF, fft_swdfti excluded from baseline, cmake reconfigures clean).
Measured (4GaAs ecut60 54^3): veff_pw 1.7-1.8x, scales with np; energy bit-identical.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 17, 2026 04:44

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds an optional Sunway-specific CPU FFT backend that accelerates local 1D “sticks” FFTs using the swFFT xMath DFTI API, and wires it into the FFT factory and build system.

Changes:

  • Introduces FFT_SWDFTI (derived from FFT_CPU) that offloads z/x 1D FFTs to DFTI while keeping other steps on FFTW.
  • Updates FFT_Bundle to instantiate FFT_SWDFTI when __SWDFTI is enabled.
  • Extends CMake to add the new source and configure Sunway build flags and linking for the isolated swFFT archive.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
source/source_basis/module_pw/CMakeLists.txt Conditionally compiles the new SWDFTI FFT implementation into the build.
source/source_base/module_fft/fft_swdfti.h Declares the new FFT_SWDFTI backend class.
source/source_base/module_fft/fft_swdfti.cpp Implements SWDFTI setup/compute paths for z/x FFTs and falls back to FFTW as needed.
source/source_base/module_fft/fft_cpu.h Exposes internals to allow the SWDFTI subclass to reuse FFT_CPU plans/dimensions.
CMakeLists.txt Adds Sunway compile flag checks, the USE_SWDFTI option, and changes SW linking to an isolated swFFT archive.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread source/source_base/module_fft/fft_swdfti.cpp
Comment thread CMakeLists.txt
Comment thread source/source_base/module_fft/fft_cpu.h
Comment thread source/source_base/module_fft/fft_swdfti.cpp Outdated
Comment thread source/source_base/module_fft/fft_swdfti.h
- cleanFFT: free the z/x DFTI descriptors before nulling the handles
  (previously leaked the descriptors).
- setupFFT: use std::call_once for the one-time DftiInitAthread CPE spawn
  instead of a non-thread-safe static int guard.
- CMake: link libswfft_xmath_iso.a only when USE_SWDFTI is ON, and fail
  fast with a clear message if the archive is missing.
- fft_swdfti.h: include <complex> explicitly (no longer rely on transitive
  include from fft_cpu.h).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@A-006

A-006 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the Copilot review (commit b780c63):

  • cleanFFT descriptor leak — the z/x DFTI descriptors are now freed via DftiFreeDescriptor before the handles are nulled (null-guarded).
  • Non-thread-safe initDftiInitAthread is now a std::call_once one-time CPE spawn instead of a plain static-int guard.
  • Unconditional iso-archive linklibswfft_xmath_iso.a is linked only when USE_SWDFTI is ON, with a clear FATAL_ERROR if the archive is missing.
  • Self-contained headerfft_swdfti.h now includes <complex> explicitly.
  • privateprotected in fft_cpu.h — replied inline: intentional and minimal-scope for the single FFT_SWDFTI subclass; happy to narrow to a friend if preferred.

The x86 build is byte-identical to develop here (the SWDFTI backend is
USE_SW/USE_SWDFTI-gated and not compiled in x86 CI), and develop passes
17_DS_DFTU. The failure was a marginal DeltaSpin+DFT+U energy fluctuation
(6.3e-7 vs the 3e-7 threshold). Empty commit to re-run CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants