Skip to content

the training duration of relationnet ++ #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sure7018 opened this issue May 8, 2021 · 9 comments
Open

the training duration of relationnet ++ #21

sure7018 opened this issue May 8, 2021 · 9 comments

Comments

@sure7018
Copy link

sure7018 commented May 8, 2021

Hello, what's the training duration of relationnet ++? Why does it take me so much time to train with a single GPU on the coco dataset??

2021-05-08 14:40:36,476 - mmdet - INFO - workflow: [('train', 1)], max: 20 epochs
2021-05-08 14:43:38,226 - mmdet - INFO - Epoch [1][50/58633] lr: 9.890e-04, eta: 49 days, 7:57:52, time: 3.635, data_time: 0.054, memory: 9028, kpt_loss_point_cls: 1.1461, kpt_loss_point_offset: 0.0875, bbox_loss_cls: 1.2137, bbox_loss_bbox: 0.7130, loss: 3.1603
2021-05-08 14:47:02,436 - mmdet - INFO - Epoch [1][100/58633] lr: 1.988e-03, eta: 52 days, 9:05:35, time: 4.084, data_time: 0.006, memory: 9566, kpt_loss_point_cls: 1.1375, kpt_loss_point_offset: 0.0869, bbox_loss_cls: 1.2221, bbox_loss_bbox: 0.7509, loss: 3.1974
2021-05-08 14:50:23,529 - mmdet - INFO - Epoch [1][150/58633] lr: 2.987e-03, eta: 53 days, 2:39:41, time: 4.022, data_time: 0.005, memory: 9566, kpt_loss_point_cls: 1.1513, kpt_loss_point_offset: 0.0866, bbox_loss_cls: 1.2318, bbox_loss_bbox: 0.7351, loss: 3.2049
2021-05-08 14:53:27,445 - mmdet - INFO - Epoch [1][200/58633] lr: 3.986e-03, eta: 52 days, 7:26:42, time: 3.678, data_time: 0.005, memory: 9566, kpt_loss_point_cls: 1.1245, kpt_loss_point_offset: 0.0859, bbox_loss_cls: 1.1802, bbox_loss_bbox: 0.6615, loss: 3.0520
2021-05-08 14:56:36,176 - mmdet - INFO - Epoch [1][250/58633] lr: 4.985e-03, eta: 52 days, 2:10:09, time: 3.775, data_time: 0.005, memory: 9566, kpt_loss_point_cls: 1.0461, kpt_loss_point_offset: 0.0853, bbox_loss_cls: 1.2102, bbox_loss_bbox: 0.7289, loss: 3.0704

@shinya7y
Copy link
Owner

shinya7y commented May 8, 2021

Which config did you use?

@sure7018
Copy link
Author

sure7018 commented May 8, 2021

i use is the bvr_retinanet_x101_fpn_dcn_mstrain_400_1200_20e_coco.py

python tools/train.py configs/bvr/bvr_retinanet_x101_fpn_dcn_mstrain_400_1200_20e_coco.py

@shinya7y
Copy link
Owner

shinya7y commented May 8, 2021

The config has many heavy settings.

Please try the following:
Res2Net-50 or Res2Net-101
stage_with_dcn=(False, False, False, True),
../_base_/datasets/coco_detection_mstrain_480_960.py
with_cp=False
fp16 = dict(loss_scale='dynamic') or fp16 = dict(loss_scale=512.)

@shinya7y
Copy link
Owner

Even only for inference, RelationNet++ is slow on T4.
I may verify the paper's FPS by benchmarks on V100.

@sure7018
Copy link
Author

Thank you for your reply. I used resnet-50 for training, and the speed has been improved obviously, but the accuracy is not as high as that mentioned in the article. Is that the reason for epoch = 12???

@shinya7y
Copy link
Owner

If you use bvr_retinanet_r50_fpn_gn_1x_coco.py, an AP around 38.5 (the authors' result) is appropriate.
The settings I recommended and training for 20 epochs will boost accuracy.

Please don't forget to change the learning rate according to the Linear Scaling Rule.

lr=0.01    for total batch size 16 (8 GPUs * 2 samples_per_gpu)
lr=0.00125 for total batch size 2  (1 GPU  * 2 samples_per_gpu)

@sure7018
Copy link
Author

Thank you very much for your reply. I will try it

@sure7018
Copy link
Author

The config has many heavy settings.

Please try the following:
Res2Net-50 or Res2Net-101
stage_with_dcn=(False, False, False, True),
../_base_/datasets/coco_detection_mstrain_480_960.py
with_cp=False
fp16 = dict(loss_scale='dynamic') or fp16 = dict(loss_scale=512.)

Hello, will the accuracy be affected after the above modification??

@shinya7y
Copy link
Owner

Res2Net-50 or Res2Net-101 affect accuracy.
stage_with_dcn=(False, False, False, True), affects accuracy.
../_base_/datasets/coco_detection_mstrain_480_960.py affects accuracy.
with_cp=False should not affect accuracy.
fp16 = dict(loss_scale='dynamic') or fp16 = dict(loss_scale=512.) are expected not to affect accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants