You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/FAQ.rst
+24-6
Original file line number
Diff line number
Diff line change
@@ -225,23 +225,41 @@ There is also a :py:class:`tz.m.WrapClosure<torczhero.modules.WrapClosure>` for
225
225
226
226
How to save/serialize a modular optimizer?
227
227
============================================
228
-
TODO
228
+
Please refer to pytorch docs https://pytorch.org/tutorials/beginner/saving_loading_models.html.
229
+
230
+
Like pytorch optimizers, torchzero modular optimizers and modules support :code:`opt.state_dict()` and :code:`opt.load_state_dict()`, which saves and loads state dicts of all modules, including nested ones.
231
+
232
+
So you can use the standard code for saving and loading:
A thorough benchmark will be posted to this section very soon. There is no overhead other than what is described below.
233
-
234
-
Since some optimizers, like Adam, have learning rate baked into the update rule, but we use LR module instead, that requires an extra add operation. Currently if :code:`tz.m.Adam` or :code:`tz.m.Wrap` are directly followed by a :code:`tz.m.LR`, they will be automatically fused (:code:`Wrap` fuses only when wrapped optimizer has an :code:`lr` parameter). However adding LR fusing to all modules with a learning rate is not a priority.
252
+
Since some optimizers, like Adam, have learning rate baked into the update rule, but we use LR module instead, that requires an extra add operation. Currently if :code:`tz.m.Adam` or :code:`tz.m.Wrap` are directly followed by a :code:`tz.m.LR`, they will be automatically fused (:code:`Wrap` fuses only when wrapped optimizer has an :code:`lr` parameter) to mitigate that. However adding LR fusing to all modules with a learning rate is not a priority. From what I can tell this overhead is negligible.
235
253
236
254
Whenever possible I used `_foreach_xxx <https://pytorch.org/docs/stable/torch.html#foreach-operations>`_ operations. Those operate on all parameters at once instead of using a slow python for-loops. This makes the optimizers way quicker, especially with a lot of different parameter tensors. Also all modules change the update in-place whenever possible.
237
255
238
256
Is there support for complex-valued parameters?
239
257
=================================================
240
-
Currently no, as I have not made the modules with complex-valued parameters in mind, although some might still work. I do use complex-valued networks so I am looking into adding support. There may actually be a way to support them automatically.
258
+
:code:`tz.m.ViewAsReal()` and :code:`tz.m.ViewAsComplex()` modules will be added soon. This will also allow to use custom pytorch optimizers with complex networks (via :code:`tz.m.Wrap`), even if they don't support those natively.
241
259
242
260
Is there support for optimized parameters being on different devices?
warnings.warn(f"Tried to load state dict for {i}th module: {v.__class__.__name__}, but it is not present in state_dict with {list(state_dict.keys()) =}")
0 commit comments