this paper is very interesting, I hope to know when encoding the global guidence vector, which size to down the full image to input the Encode net ?