Thanks for sharing such a good framework. I'm curious that when I'm using WideResNet with depth-k (26-10), it actually has less GPU usage than DenseNet-BC(k=40) even though WideResNet has more trainable params. Am I missing something? Hope for your clarification.