Skip to content

[cuda.compute]: API cleanup before 1.0#8772

Merged
shwina merged 6 commits into
NVIDIA:mainfrom
shwina:cuda-compute-api-cleanup
May 4, 2026
Merged

[cuda.compute]: API cleanup before 1.0#8772
shwina merged 6 commits into
NVIDIA:mainfrom
shwina:cuda-compute-api-cleanup

Conversation

@shwina
Copy link
Copy Markdown
Contributor

@shwina shwina commented Apr 30, 2026

Description

  • All cuda.compute algorithm parameters are now keyword-only — positional calls raise TypeError
  • Parameter order aligned with CUB's C++ API (e.g. num_items before op in reduce_into and merge_sort)
  • merge_sort parameter rename: d_in_items/d_out_itemsd_in_values/d_out_values
  • CUB header device_merge_sort.cuh: matching rename of d_items/d_input_items/d_output_itemsd_values/d_input_values/d_output_values
  • Docs updated: keyword-only convention documented in API conventions section

Changed files

Algorithm implementations (7)

File Change
_reduce.py Keyword-only; num_items before op
_segmented_reduce.py Keyword-only; num_segments before offsets; op/h_init after offsets
_scan.py Keyword-only; removed | None from exclusive_scan's init_value annotation
_binary_search.py Keyword-only; reordered to d_data, num_items, d_values, num_values, d_out
_sort/_radix_sort.py Keyword-only; num_items before order
_sort/_merge_sort.py Keyword-only; d_in_items/d_out_itemsd_in_values/d_out_values; num_items before op
_sort/_segmented_sort.py Keyword-only

CUB C++ (1)

  • cub/device/device_merge_sort.cuh: rename d_itemsd_values throughout

Tests / examples / benchmarks (72)

All call sites updated to keyword-only form with corrected parameter order.

Docs (1)

  • docs/python/compute/index.rst: added keyword-only convention to API conventions section; updated object-based API code example

Migration

# Before
cuda.compute.reduce_into(d_in, d_out, op, num_items, h_init)
cuda.compute.merge_sort(d_in_keys, d_in_items, d_out_keys, d_out_items, op, num_items)
cuda.compute.lower_bound(d_data, d_values, d_out, num_items, num_values)

# After
cuda.compute.reduce_into(d_in=d_in, d_out=d_out, num_items=num_items, op=op, h_init=h_init)
cuda.compute.merge_sort(d_in_keys=d_in_keys, d_in_values=d_in_values, d_out_keys=d_out_keys, d_out_values=d_out_values, num_items=num_items, op=op)
cuda.compute.lower_bound(d_data=d_data, num_items=num_items, d_values=d_values, num_values=num_values, d_out=d_out)

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@shwina shwina requested review from a team as code owners April 30, 2026 20:42
@shwina shwina requested a review from alliepiper April 30, 2026 20:42
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Apr 30, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Apr 30, 2026
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@shwina shwina force-pushed the cuda-compute-api-cleanup branch from 9a8e775 to 172b74f Compare May 1, 2026 18:03
size_t& temp_storage_bytes,
KeyIteratorT d_keys,
ValueIteratorT d_items,
ValueIteratorT d_values,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change from key/item to key/value improves readability, 👍.

Suggestion: Move this change to separate PR. It is not related to Python changes.

# perform the reduction, passing the temporary storage as the first argument:
reducer(temp_storage, d_in, d_out, op, num_items, h_init)
# perform the reduction:
reducer(temp_storage=temp_storage, d_in=d_in, d_out=d_out, num_items=num_items, op=op, h_init=h_init)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Per naming convention outlined above, this should be d_temp_storage, or even d_temp.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

🥳 CI Workflow Results

🟩 Finished in 1h 27m: Pass: 100%/315 | Total: 4d 15h | Max: 1h 04m | Hits: 94%/186907

See results here.

Comment on lines 30 to 31
op=..., # binary operator (built-in or user-defined)
num_items=..., # number of input elements
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this works because it is keyword only but it would look better if we consistently put num_items before op

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose in examples as well.

@shwina shwina merged commit 01b00d0 into NVIDIA:main May 4, 2026
338 of 339 checks passed
@jrhemstad jrhemstad linked an issue May 5, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

cuda.compute: Make Python APIs keyword-only

3 participants