The poly learning rate doesn't work as intended. The current implementation is as follows:
def get_lr(self):
if self.last_epoch % self.decay_iter or self.last_epoch % self.max_iter:
return [base_lr for base_lr in self.base_lrs]
else:
factor = (1 - self.last_epoch / float(self.max_iter)) ** self.gamma
return [base_lr * factor for base_lr in self.base_lrs]
Notice that the else condition will never get hit since self.last_epoch % self.max_iter will almost always return a non-zero number.