@lsdefine Thanks for your sharing, I use transformer to do seq2seq task. Like, input a article and predict the abstract. When I finish training, I get almost same output with different input. Code are same as your example, data should be right, because with same data, and use lstm block as seq2seq, I got the proper output.
Hope for your answer, Thanks.