-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Hi,
After reading "When to transpose data" I'm afraid I'm still a bit confused. It seems the "normal" way to use FTorch is to have your Fortran and PyTorch arrays be similar, and have FTorch get around the column/row major differences of the languages by working out the appropriate strides in torch_tensor_from_array. But wouldn't this imply a performance penalty or a transpose somewhere else? For example, SIMD vectorization is generally much faster with unit strides. I imagine the PyTorch model wants to work with contiguous arrays for various optimizations e.g. vectorization. If the Fortran and PyTorch code are set up with similar arrays, there must be a transpose or non-unit-stride access happening somewhere.
I've previously used the PyTorch-Fortran library and with that I constructed my Fortran input arrays with the transposed layout w.r.t. PyTorch, for instance my_fortran_array(nx, nbatch) where nx is the contiguous dimension in memory and my understanding is that the library takes a pointer and will simply re-interpret the array as pytorch_array(nbatch, nx) where again nx is the contiguous dimension in physical memory, as it should be. How can I replicate this behaviour with FTorch and avoid any under-the-hood transposes / non contiguous memory access?. Do I do what I did before but pass layout=[2,1] to torch_tensor_from_array? A bit brain-dead today, sorry if I missed something obvious!