the training duration of relationnet ++ #21

sure7018 · 2021-05-08T07:00:06Z

Hello, what's the training duration of relationnet ++? Why does it take me so much time to train with a single GPU on the coco dataset??

2021-05-08 14:40:36,476 - mmdet - INFO - workflow: [('train', 1)], max: 20 epochs
2021-05-08 14:43:38,226 - mmdet - INFO - Epoch [1][50/58633] lr: 9.890e-04, eta: 49 days, 7:57:52, time: 3.635, data_time: 0.054, memory: 9028, kpt_loss_point_cls: 1.1461, kpt_loss_point_offset: 0.0875, bbox_loss_cls: 1.2137, bbox_loss_bbox: 0.7130, loss: 3.1603
2021-05-08 14:47:02,436 - mmdet - INFO - Epoch [1][100/58633] lr: 1.988e-03, eta: 52 days, 9:05:35, time: 4.084, data_time: 0.006, memory: 9566, kpt_loss_point_cls: 1.1375, kpt_loss_point_offset: 0.0869, bbox_loss_cls: 1.2221, bbox_loss_bbox: 0.7509, loss: 3.1974
2021-05-08 14:50:23,529 - mmdet - INFO - Epoch [1][150/58633] lr: 2.987e-03, eta: 53 days, 2:39:41, time: 4.022, data_time: 0.005, memory: 9566, kpt_loss_point_cls: 1.1513, kpt_loss_point_offset: 0.0866, bbox_loss_cls: 1.2318, bbox_loss_bbox: 0.7351, loss: 3.2049
2021-05-08 14:53:27,445 - mmdet - INFO - Epoch [1][200/58633] lr: 3.986e-03, eta: 52 days, 7:26:42, time: 3.678, data_time: 0.005, memory: 9566, kpt_loss_point_cls: 1.1245, kpt_loss_point_offset: 0.0859, bbox_loss_cls: 1.1802, bbox_loss_bbox: 0.6615, loss: 3.0520
2021-05-08 14:56:36,176 - mmdet - INFO - Epoch [1][250/58633] lr: 4.985e-03, eta: 52 days, 2:10:09, time: 3.775, data_time: 0.005, memory: 9566, kpt_loss_point_cls: 1.0461, kpt_loss_point_offset: 0.0853, bbox_loss_cls: 1.2102, bbox_loss_bbox: 0.7289, loss: 3.0704

shinya7y · 2021-05-08T07:21:07Z

Which config did you use?

sure7018 · 2021-05-08T07:43:10Z

i use is the bvr_retinanet_x101_fpn_dcn_mstrain_400_1200_20e_coco.py

python tools/train.py configs/bvr/bvr_retinanet_x101_fpn_dcn_mstrain_400_1200_20e_coco.py

shinya7y · 2021-05-08T08:20:30Z

The config has many heavy settings.

Please try the following:
Res2Net-50 or Res2Net-101
stage_with_dcn=(False, False, False, True),
../_base_/datasets/coco_detection_mstrain_480_960.py
with_cp=False
fp16 = dict(loss_scale='dynamic') or fp16 = dict(loss_scale=512.)

shinya7y · 2021-05-10T14:01:03Z

Even only for inference, RelationNet++ is slow on T4.
I may verify the paper's FPS by benchmarks on V100.

sure7018 · 2021-05-12T12:00:16Z

Thank you for your reply. I used resnet-50 for training, and the speed has been improved obviously, but the accuracy is not as high as that mentioned in the article. Is that the reason for epoch = 12？？？

shinya7y · 2021-05-12T13:50:41Z

If you use bvr_retinanet_r50_fpn_gn_1x_coco.py, an AP around 38.5 (the authors' result) is appropriate.
The settings I recommended and training for 20 epochs will boost accuracy.

Please don't forget to change the learning rate according to the Linear Scaling Rule.

lr=0.01    for total batch size 16 (8 GPUs * 2 samples_per_gpu)
lr=0.00125 for total batch size 2  (1 GPU  * 2 samples_per_gpu)

sure7018 · 2021-05-13T00:11:46Z

Thank you very much for your reply. I will try it

sure7018 · 2021-05-15T10:15:23Z

The config has many heavy settings.

Please try the following:
Res2Net-50 or Res2Net-101
stage_with_dcn=(False, False, False, True),
../_base_/datasets/coco_detection_mstrain_480_960.py
with_cp=False
fp16 = dict(loss_scale='dynamic') or fp16 = dict(loss_scale=512.)

Hello, will the accuracy be affected after the above modification？？

shinya7y · 2021-05-15T13:53:24Z

Res2Net-50 or Res2Net-101 affect accuracy.
stage_with_dcn=(False, False, False, True), affects accuracy.
../_base_/datasets/coco_detection_mstrain_480_960.py affects accuracy.
with_cp=False should not affect accuracy.
fp16 = dict(loss_scale='dynamic') or fp16 = dict(loss_scale=512.) are expected not to affect accuracy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

the training duration of relationnet ++ #21

the training duration of relationnet ++ #21

sure7018 commented May 8, 2021 •

edited

Loading

shinya7y commented May 8, 2021

Uh oh!

sure7018 commented May 8, 2021 •

edited

Loading

Uh oh!

shinya7y commented May 8, 2021

Uh oh!

shinya7y commented May 10, 2021

Uh oh!

sure7018 commented May 12, 2021

Uh oh!

shinya7y commented May 12, 2021

Uh oh!

sure7018 commented May 13, 2021

Uh oh!

sure7018 commented May 15, 2021

Uh oh!

shinya7y commented May 15, 2021

Uh oh!

the training duration of relationnet ++ #21

the training duration of relationnet ++ #21

Comments

sure7018 commented May 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

shinya7y commented May 8, 2021

Uh oh!

sure7018 commented May 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shinya7y commented May 8, 2021

Uh oh!

shinya7y commented May 10, 2021

Uh oh!

sure7018 commented May 12, 2021

Uh oh!

shinya7y commented May 12, 2021

Uh oh!

sure7018 commented May 13, 2021

Uh oh!

sure7018 commented May 15, 2021

Uh oh!

shinya7y commented May 15, 2021

Uh oh!

sure7018 commented May 8, 2021 •

edited

Loading

sure7018 commented May 8, 2021 •

edited

Loading