WIP: PERF: Use NeighborhoodRange for metric image computation#98
WIP: PERF: Use NeighborhoodRange for metric image computation#98thewtex wants to merge 1 commit intoKitwareMedical:masterfrom
Conversation
Depends on: http://review.source.kitware.com/#/c/23795/4 Currently 4X slower
|
Hey @thewtex, what script/test are you using for testing the performance? |
|
@phcerdan |
| using RangeType = Experimental::ShapedImageNeighborhoodRange<const MetricImageType, | ||
| Experimental::ConstantBoundaryImageNeighborhoodPixelAccessPolicy<const MetricImageType> >; | ||
| const RangeType movingRange{ *movingMinusMean, denomIt.GetIndex(), offsets }; | ||
| const RangeType kernelRange{ *fixedMinusMean, denomIt.GetIndex(), offsets }; |
There was a problem hiding this comment.
Is it really intended that the kernel is set on a new location (denomIt.GetIndex()) with every iteration?
| const RangeType movingRange{ *movingMinusMean, denomIt.GetIndex(), offsets }; | ||
| const RangeType kernelRange{ *fixedMinusMean, denomIt.GetIndex(), offsets }; | ||
|
|
||
| const MetricImagePixelType normXcorr = std::inner_product( movingRange.begin(), movingRange.end(), kernelRange.begin(), 0.0 ) / denomIt.Get(); |
There was a problem hiding this comment.
I have to admit I could not get optimal performance when using std::inner_product at https://github.com/InsightSoftwareConsortium/ITK/blob/master/Modules/Core/ImageFunction/include/itkGaussianDerivativeImageFunction.hxx#L219, so instead I just manually wrote the inner product calculation in a few lines of code. In this case, it might look as follows:
auto normXcorr = NumericTraits<MetricImagePixelType>::Zero;
auto movingRangeIterator = movingRange.cbegin();
for (const auto& kernelValue : kernelRange)
{
normXcorr += kernelValue * (*movingRangeIterator);
++movingRangeIterator;
}
normXcorr /= denomIt.Get();
|
This use case might suggest adding Update: The member function is added with patch set 5: http://review.source.kitware.com/#/c/23795/5 |
|
Great ideas @N-Dekker ! 💡 And thanks for adding the On a related note, I was examining the patch and the existing code, and the construction of the proxy in Do you think the construction here has performance implications and would it be avoidable? |
The use of a proxy as return type of However, looking at http://review.source.kitware.com/#/c/23795/5/Modules/Core/Common/include/itkConstantBoundaryImageNeighborhoodPixelAccessPolicy.h my intuition tells me that it might be possible to squeeze some CPU cycles out of the private helper function Update: With patch set 7 I adjusted the private helper function |
@thewtex Which compiler do you use? (Release build, optimized for speed?) I'm asking because I observed significant PERF differences between VS2015 and VS2017 (both 64-bit Release), while running the example code that I posted at https://discourse.itk.org/t/custom-border-extrapolation-of-shapedimageneighborhoodrange-by-imageneighborhoodpixelaccesspolicy/879/27 |
|
@N-Dekker this was Clang / MinSizeRel build. I will try other compilers and other build configurations, along with your example code! |
|
@thewtex I'm interested to see your results! Using VS2017 Release, I found that NeighborhoodRange based iteration was almost 3x faster than the old-school
|
|
Performance results discussed here: |
Depends on:
http://review.source.kitware.com/#/c/23795/4
Currently 4X slower
Closes #97