-
Notifications
You must be signed in to change notification settings - Fork 72
[wip] Cubecl API and grayscale to rgb u8 kernel #239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hey! I have a question since the free github actions can't support a GPU how would we be able to test the GPU backend for future implementations? |
for now assume that we have to check with our own computers with a gpu |
#[cube(launch_unchecked)] | ||
fn gray_from_rgb_u8_cl_kernel( | ||
src: &Array<Line<u8>>, | ||
dst: &mut Array<Line<u8>>, | ||
args: GrayFromRgbArgs, | ||
) { | ||
let x = CUBE_POS_X * CUBE_DIM_X + UNIT_POS_X; | ||
let y = CUBE_POS_Y * CUBE_DIM_Y + UNIT_POS_Y; | ||
|
||
let cols = args.cols; | ||
let rows = args.rows; | ||
|
||
if x < cols && y < rows { | ||
let idx = y * cols + x; | ||
let r = u16::cast_from(src[3 * idx]); | ||
let g = u16::cast_from(src[3 * idx + 1]); | ||
let b = u16::cast_from(src[3 * idx + 2]); | ||
let gray = u8::cast_from(((r * 77 + g * 150 + b * 29) + 128) >> 8); | ||
dst[idx] = Line::new(gray); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edited:
I ran the same test successfully with the following implementation instead:
#[cube(launch_unchecked)]
fn gray_from_rgb_u8_cl_kernel(
src: &Array<Line<u8>>,
dst: &mut Array<Line<u8>>,
) {
let idx = ABSOLUTE_POS;
if idx < dst.len() {
let r = u16::cast_from(src[3 * idx]);
let g = u16::cast_from(src[3 * idx + 1]);
let b = u16::cast_from(src[3 * idx + 2]);
let gray = u8::cast_from(((r * 77 + g * 150 + b * 29) + 128) >> 8);
dst[idx] = Line::new(gray);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I make a PR to this branch or to the main branch? I am also revising the ImageCL abstraction layer for #251 from burn CubeTensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @omar-abdelgawad thanks for trying! Have you found a huge difference ? I tried at some point and was minimal. Happy to adopt your version in any case. I think for now I'll refactor this work into a separated experimental kornia-cubecl crate to isolate deps until it matures. This will happen between today and tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you found a huge difference ?
No The difference is a few microseconds in the mean time of 256x224 benchmark only since all the measurements are quite fast anyway. But I was just trying to write the kernel to be as concise as possible and also figure out what are the optimal cube count, cube dim, and vectorization to use. btw, I tried to write about my results in the burn Grayscale thread that you opened before. Is your current benchmark still slow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the recent changes of having the imagecl struct holding the runtime data everything got more clear, as the cubecl guys mentioned the bottleneck was in the host/device copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now I'll refactor this work into a separated experimental kornia-cubecl crate to isolate deps until it matures. This will happen between today and tomorrow
Maybe you can tell me the layout of what you want to be done and I will handle it if possible. I submitted my proposal for this project anyway so I would be more than happy to start making a PR. Also should I skip the gpu tests for CI passing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi, sorry for late reply here. Not sure if you want to keep contributing on this but possibly the best to move forward is to have it for now as separated experimental kornia-cubecl
crate. As we are still figuring out the final api for the current Tensor/Image to be agnostic to the memory allocator and backend, we might want the cubecl Image as decoupled implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. As for my contribution, I honestly don't have much free time lately but I don't mind contributing to this branch and keeping it updated with the main branch. for now, can I make PRs to this branch (cubecl-grayscale)?
No description provided.