Skip to content
This repository was archived by the owner on May 1, 2025. It is now read-only.
This repository was archived by the owner on May 1, 2025. It is now read-only.

Error during training #54

@5y

Description

@5y

By any chance do you have any idea why I received following error during training?
I'm running the docker file on some RTX 2080 ti and the last version of CUDA.

Thank you.

process_0 - Initializing MultitaskQuestionAnsweringNetwork
process_0 - MultitaskQuestionAnsweringNetwork has 14,469,902 trainable parameters
Traceback (most recent call last):
File "/decaNLP/train.py", line 374, in
main()
File "/decaNLP/train.py", line 370, in main
run(args, run_args, world_size=args.world_size)
File "/decaNLP/train.py", line 299, in run
model = init_model(args, field, logger, world_size, device)
File "/decaNLP/train.py", line 327, in init_model
model.to(device)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 379, in to
return self._apply(convert)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 112, in _apply
self.flatten_parameters()
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 105, in flatten_parameters
self.batch_first, bool(self.bidirectional))
RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions