Conversation
Implements GPU kernel for sparse Box Least Squares algorithm based on https://arxiv.org/abs/2103.06193. The sparse BLS algorithm tests all pairs of observations as potential transit boundaries, providing O(N²) complexity per frequency. Key features: - Two kernel variants: simplified (reliable) and optimized (faster) - Achieves up to 290x speedup over CPU for realistic problem sizes - Accuracy verified to within 1e-6 of CPU implementation - Supports ignore_negative_delta_sols parameter for filtering inverted dips Implementation details: - sparse_bls_simple.cu: Simplified O(N³) kernel with bubble sort - Single-threaded transit testing for reliability - Parallel weight normalization and statistics computation - Preferred implementation for datasets < 500 observations - sparse_bls.cu: Optimized kernel with bitonic sort and cumulative sums - Parallel transit testing across threads - More complex but potentially faster for large datasets - sparse_bls_gpu(): Python wrapper function - Compiles kernel automatically on first use - Direct kernel invocation (no .prepare()) for compatibility - Configurable block size and shared memory allocation - Test coverage: comprehensive parametrized tests in test_bls.py - Tests against CPU sparse BLS for correctness - Tests against single_bls for consistency - Multiple parameter combinations (freq, q, phi0, ndata, ignore_negative_delta_sols) Performance: - ndata=500, nfreqs=100: 290x speedup (111s CPU vs 0.4s GPU) - ndata=200, nfreqs=100: 90x speedup (18s CPU vs 0.2s GPU) - ndata=100, nfreqs=100: 25x speedup (4.5s CPU vs 0.18s GPU) Note: GPU overhead makes it slower for very small problems (ndata<50, nfreqs<20) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Python 3.7 is not available on Ubuntu 24.04 which is now used by GitHub Actions ubuntu-latest runners. Updated: - .github/workflows/tests.yml: Removed Python 3.7 from test matrix - pyproject.toml: Updated requires-python to >=3.8 - pyproject.toml: Removed Python 3.7 classifier Tests will now run on Python 3.8, 3.9, 3.10, 3.11, and 3.12. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull Request Overview
This PR adds GPU-accelerated sparse BLS (Box Least Squares) implementation for period-finding in astronomical time series data. The sparse BLS algorithm tests all pairs of observations as potential transit boundaries, providing an O(N²) alternative to binned approaches that is particularly efficient for datasets with ~50-500 observations.
Key changes:
- Implements two CUDA kernel variants: a simplified reliable kernel (sparse_bls_simple.cu) and an optimized kernel with parallel sorting (sparse_bls.cu)
- Adds GPU compilation and wrapper functions in bls.py with full parameter support including ignore_negative_delta_sols
- Comprehensive parametrized tests validating GPU implementation against CPU sparse BLS and single_bls()
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Updated minimum Python version from 3.7 to 3.8 |
| .github/workflows/tests.yml | Removed Python 3.7 from test matrix |
| cuvarbase/kernels/sparse_bls_simple.cu | New simplified CUDA kernel for sparse BLS using bubble sort and single-threaded transit testing |
| cuvarbase/kernels/sparse_bls.cu | New optimized CUDA kernel with bitonic sort and parallel transit testing |
| cuvarbase/bls.py | Added compile_sparse_bls() and sparse_bls_gpu() functions for GPU kernel compilation and execution |
| cuvarbase/tests/test_bls.py | Added test_sparse_bls_gpu() and test_sparse_bls_gpu_vs_single() test cases |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| q = sh_phi[j] - phi0; | ||
| } | ||
|
|
||
| if (q > 0.5f) continue; |
There was a problem hiding this comment.
The q validation check 'if (q > 0.5f)' is missing the lower bound check 'q <= 0.f' that exists in the simple kernel at line 186. Both kernels should have consistent validation logic to ensure q is in a valid range.
There was a problem hiding this comment.
@copilot make a change here to add the lower bound check
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Add GPU-accelerated sparse BLS implementation
Summary
Implements GPU kernel for the sparse Box Least Squares (BLS) algorithm based on
https://arxiv.org/abs/2103.06193. The sparse BLS algorithm tests all pairs of observations as
potential transit boundaries, providing an efficient O(N²) per-frequency alternative to binned
approaches for small to medium datasets.
Key Features
(recommended)
Implementation Details
New Functions:
sparse BLS
Key Design Decisions:
memory requirements
Testing:
Performance Characteristics
Note: GPU overhead makes it slower for very small problems (ndata<50, nfreqs<20), but
dramatically faster for realistic astronomical datasets.
Files Changed
Testing Notes
Known Issue (Pre-existing): There is a pytest collection error when running the full
test_bls.py suite via pytest. This appears to be a pre-existing issue unrelated to the GPU
implementation:
See manual validation scripts included in development:
Usage Example
import numpy as np
from cuvarbase.bls import sparse_bls_gpu
Generate or load your data
t = np.array([...]) # observation times
y = np.array([...]) # observation values
dy = np.array([...]) # observation uncertainties
freqs = np.linspace(0.5, 2.0, 100) # frequencies to test
Run GPU sparse BLS
powers, solutions = sparse_bls_gpu(t, y, dy, freqs)
Each solution is (q, phi0) for the best transit at that frequency
for freq, power, (q, phi0) in zip(freqs, powers, solutions):
print(f"freq={freq:.3f}: power={power:.3f}, q={q:.4f}, phi0={phi0:.4f}")
🤖 Generated with https://claude.com/claude-code
Co-Authored-By: Claude noreply@anthropic.com