Optimizer initialization should set rescale_grad appropriately

This is potentially an ask for addressing at the Module API level, but it is more obvious with the Keras integration.

When creating an instance of mx.optimizers.Optimizer if the value of rescale_grad is not specified, the default value of 1.0 has a significant impact on training. In fact, this is pointed out as a warning in the logs, when the optimizer is initialized.
```
/usr/local/lib/python2.7/site-packages/mxnet/module/bucketing_module.py:408: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (1.0 vs. 0.0078125). Is this intended?
  force_init=force_init)
```
Since the MXNet implementations of the Keras optimizers, essentially delegate to the Module versions, this parameter should likely be configured to the normalized value, as it is not obvious from the Keras API. It is possible to provide rescale_grad as an additional argument, but that requires the user to know some of the details of both frameworks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimizer initialization should set rescale_grad appropriately #210

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimizer initialization should set rescale_grad appropriately #210

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions