Description
Hi, when I am doing pretraining using EncoderDecoderTrainer, I noticed my loss become negative. I think the problem originates from the EncoderDecoderLoss, which is
errors = x_pred - x_true
reconstruction_errors = torch.mul(errors, mask) ** 2
x_true_means = torch.mean(x_true, dim=0)
x_true_means[x_true_means == 0] = 1
x_true_stds = torch.std(x_true, dim=0) ** 2
x_true_stds[x_true_stds == 0] = x_true_means[x_true_stds == 0]
features_loss = torch.matmul(reconstruction_errors, 1 / x_true_stds)
nb_reconstructed_variables = torch.sum(mask, dim=1)
features_loss_norm = features_loss / (nb_reconstructed_variables + self.eps)
loss = torch.mean(features_loss_norm)
When x_true_means is negative, it would potentially cause x_true_stds to be negative. Should it be absolute value instead?
batch_stds[batch_stds == 0] = torch.abs(batch_means[batch_stds == 0])
I think the motivation is to scale the loss to the magnitude of the varience of the batch to count for different range of the features, when it fails(0 variance), it scale to the magnitude of the mean. It doesnt make too much sense when it becomes negative.
Also, there is a seperated problem for EncoderDecoderModel. _forward_tabnet returns x_embed_rec, x_embed, mask. This is inconsistent with others, which return x_embed, x_embed_rec, mask. This may cause miscalculation when pretaining Tabnet with EncoderDecoderTrainer, whose _train_step is
x_embed, x_embed_rec, mask = self.ed_model(X)
loss = self.loss_fn(x_embed, x_embed_rec, mask)