Skip to content

Error when training  #4

@XinyuLyu

Description

@XinyuLyu

Thanks for sharig the codes. I got an error when training with the command "python main.py --num_abots 3 --num_qbots 1 --scratch --outf save/temp_dir". My environment is Python 3.6 with pytorch 0.4.0. Can you help me about that?

(community) lvxinyu@12315:~$ CUDA_VISIBLE_DEVICES=4 python /home/lvxinyu/code/visualdialog-pytorch/main.py --num_abots 3 --num_qbots 1 --scratch --outf /home/lvxinyu/code/visualdialog-pytorch/data/v09/save/temp_dir
DataLoader loading: train
Loading image feature from data/vdl_img_vgg.h5
train number of data: 82783
Loading txt from data/visdial_data.h5
Vocab Size: 8964
DataLoader loading: test
Loading image feature from data/vdl_img_vgg.h5
test number of data: 40504
Loading txt from data/visdial_data.h5
Vocab Size: 8964
Initializing A-Bot and Q-Bot...
/home/lvxinyu/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
Starting Epoch: 1 | K: 10
Done with Batch # 20 | Av. Time Per Batch: 1.090s
Done with Batch # 40 | Av. Time Per Batch: 1.015s
Done with Batch # 60 | Av. Time Per Batch: 0.998s
Done with Batch # 80 | Av. Time Per Batch: 1.034s
Done with Batch # 100 | Av. Time Per Batch: 1.069s
Done with Batch # 120 | Av. Time Per Batch: 1.067s
Done with Batch # 140 | Av. Time Per Batch: 1.088s
Done with Batch # 160 | Av. Time Per Batch: 1.043s
Done with Batch # 180 | Av. Time Per Batch: 1.075s
Done with Batch # 200 | Av. Time Per Batch: 1.092s
Done with Batch # 220 | Av. Time Per Batch: 1.037s
Done with Batch # 240 | Av. Time Per Batch: 1.061s
Done with Batch # 260 | Av. Time Per Batch: 1.038s
Done with Batch # 280 | Av. Time Per Batch: 1.051s
Done with Batch # 300 | Av. Time Per Batch: 1.099s
Done with Batch # 320 | Av. Time Per Batch: 1.096s
Done with Batch # 340 | Av. Time Per Batch: 1.076s
Done with Batch # 360 | Av. Time Per Batch: 1.067s
Done with Batch # 380 | Av. Time Per Batch: 1.057s
Done with Batch # 400 | Av. Time Per Batch: 1.066s
Done with Batch # 420 | Av. Time Per Batch: 1.078s
Done with Batch # 440 | Av. Time Per Batch: 1.051s
Done with Batch # 460 | Av. Time Per Batch: 1.085s
Done with Batch # 480 | Av. Time Per Batch: 1.076s
Done with Batch # 500 | Av. Time Per Batch: 1.064s
Done with Batch # 520 | Av. Time Per Batch: 1.052s
Done with Batch # 540 | Av. Time Per Batch: 1.093s
Done with Batch # 560 | Av. Time Per Batch: 1.045s
Done with Batch # 580 | Av. Time Per Batch: 1.076s
Done with Batch # 600 | Av. Time Per Batch: 1.031s
Done with Batch # 620 | Av. Time Per Batch: 1.043s
Done with Batch # 640 | Av. Time Per Batch: 1.073s
Done with Batch # 660 | Av. Time Per Batch: 1.028s
Done with Batch # 680 | Av. Time Per Batch: 1.039s
Done with Batch # 700 | Av. Time Per Batch: 1.073s
Done with Batch # 720 | Av. Time Per Batch: 1.035s
Done with Batch # 740 | Av. Time Per Batch: 1.074s
Done with Batch # 760 | Av. Time Per Batch: 1.085s
Done with Batch # 780 | Av. Time Per Batch: 1.070s
Done with Batch # 800 | Av. Time Per Batch: 1.047s
Done with Batch # 820 | Av. Time Per Batch: 1.073s
Done with Batch # 840 | Av. Time Per Batch: 1.083s
Done with Batch # 860 | Av. Time Per Batch: 1.058s
Done with Batch # 880 | Av. Time Per Batch: 1.067s
Done with Batch # 900 | Av. Time Per Batch: 1.043s
Done with Batch # 920 | Av. Time Per Batch: 1.063s
Done with Batch # 940 | Av. Time Per Batch: 1.063s
Done with Batch # 960 | Av. Time Per Batch: 1.064s
Done with Batch # 980 | Av. Time Per Batch: 1.070s
Done with Batch # 1000 | Av. Time Per Batch: 1.076s
Done with Batch # 1020 | Av. Time Per Batch: 1.063s
Done with Batch # 1040 | Av. Time Per Batch: 1.073s
Done with Batch # 1060 | Av. Time Per Batch: 1.087s
Done with Batch # 1080 | Av. Time Per Batch: 1.056s
Done with Batch # 1100 | Av. Time Per Batch: 1.070s
Traceback (most recent call last):
File "/home/lvxinyu/code/visualdialog-pytorch/main.py", line 624, in
im_loss_epoch_n = train(epoch,k_curr)
File "/home/lvxinyu/code/visualdialog-pytorch/main.py", line 132, in train
lm_loss.backward()
File "/home/lvxinyu/.local/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/lvxinyu/.local/lib/python3.6/site-packages/torch/autograd/init.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: The expanded size of the tensor (75) must match the existing size (58) at non-singleton dimension 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions