They're too generic currently and inefficient. We can probably specialize most combinations using constructs such as vcombine_f32(vget_high_f32(x), vget_low_f32(y)) vrev64q_f32(x) etc.