Skip to content

Function call and floop simultaneously  #69

@charles-r-smith

Description

@charles-r-smith

I attached an example below. The example works great, but if I want to modify x in the function call, then the function doesn't work. There are real speed gains from combining floop and CUDAEx() compared to other options and I want to be able to exploit them but also modify x within the function. Is that possible?

### Packages
using CUDA, FLoops, BenchmarkTools, FoldsCUDA

### User Inputs
nvec=1000000
M= 50
x = CuArray(rand(Float32, (M, nvec)))

### Function Set up
function parallel_multi(f, x)
   @floop CUDAEx() for i in 1:size(x, 2)
        val = reduce(*,@view(x[:,i])) #works
        #val = reduce(*, @view(x[:,i].^2)) #doesn't work
     #val = reduce(*, x[:,i].^2) #doesn't work
        f[i] = val 
    end
    return f
end

result = CUDA.ones(Float32, (size(x,2),1))

### Comparing speeds
display(@benchmark parallel_multi(result, $x))
display(@benchmark reduce(*, $x, dims = 1))
display(@benchmark prod($x, dims=1)) #identical to above 

'''

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions