-
Notifications
You must be signed in to change notification settings - Fork 15
Description
In batchfillerimage2d,
the array is defined with elements (caffe mode):
(batchsize,channel,nrows,ncols)
this implies that ncols is the "contiguous" dimension. For example, if I defined a (4,3) 2d array in c++:
#include <iostream>
int main(int nargs, char** argv ) {
float foo[4][3] = { {0,1,2}, {3,4,5}, {6,7,8},{9,10,11} };
int nrows = 4;
int ncols = 3;
std::cout << "foo[2][1]=" << foo[2][1] << std::endl;
std::cout << "foo[ncols*2 + 1]=" << *(&foo[0][0] + ncols*2+1) << std::endl;
std::cout << "foo[nrows*1 + 2]=" << *(&foo[0][0] + nrows*1+2) << std::endl;
};
The outcome of the above is:
twongjirad@blade:~/working/nutufts/larflow/ana$ g++ -o foo foo.cxx
twongjirad@blade:~/working/nutufts/larflow/ana$ ./foo
foo[2][1]=7
foo[ncols*2 + 1]=7
foo[nrows*1 + 2]=6
However, in batchfillerimage2d, the tensor is defined with following dims,
if (_caffe_mode) {
dim[0] = batch_size();
dim[1] = _num_channels;
dim[2] = _rows;
dim[3] = _cols;
}
but the way the image is indexed is
size_t caffe_idx = 0;
for (size_t row = 0; row < _rows; ++row) {
for (size_t col = 0; col < _cols; ++col) {
_caffe_idx_to_img_idx.at(caffe_idx) = col * _rows + row;
++caffe_idx;
}
}
Note that in "C" memory layout, the expectation would be
row*_ncols + col
as the right-most dimension should be the most "rapidly" changing dimension.
This might be for historical reasons for caffe, but this could cause (is causing) confusion about laying out tensors properly in training and analysis routines for pytorch and tensorflow where we are routinely reshaping/manipulating the tensor.
What to do? I would like to change this, but changing this is a BREAKING for everyone.
Are we stuck with this?