Masked attention

Hi,
I see that this implementation is lacking masked attention on encoder. Input_lengths should be passed to decoder (not just encoder) in order to compute this. OpenNMT already provided this in function [sequence_mask](https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/Utils.py).
Best,