The calculationg of average loss.

When doing distributed training, we typically have multiple processes running simultaneously. And for detection tasks, the number of objects for each process could be different. So we should be careful when calculating the average loss. For example, one can call `all_reduce` to get the number of objects on all processes:

```
num_reg = torch.tensor(float(dt_bbox.shape[0]))
if dist.is_availiable() and dist.is_initialized():
	num_reg = dist.all_reduce(num_reg.div_(n_world))  # get the number of all objects and divide it with the number of processes.
num_reg = num_reg.clamp(1.)

loss_bbox = torch.abs(dt_bbox - tgt_bbox).sum() / num_reg 
```

However, I can not find such implementation for detection heads in mmdetection3d (e.g., CenterPointHead). 

My question is, has the team of mmdetection3d noticed this problem? Do you think the correctness of average value matters when training detectors?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The calculationg of average loss. #1522

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The calculationg of average loss. #1522

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions