Skip to content

mobilenet V2 train fail #8

@fanweiya

Description

@fanweiya

i use mobilenet V2 backbone, but train fail

[-] Importing tensorflow...
2021-01-14 13:49:10.317068: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
[+] Done! Tensorflow version: 2.5.0-dev20201230
[-] Importing Deeplabv3plus Trainer class...
[-] Importing config files...
2021-01-14 13:49:11.537581: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-01-14 13:49:11.591072: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-01-14 13:49:11.591101: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (alit-PowerEdge-T640): /proc/driver/nvidia/version does not exist
2021-01-14 13:49:11.591383: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:Some requested devices in `tf.distribute.Strategy` are not visible to TensorFlow: /job:localhost/replica:0/task:0/device:GPU:0,/job:localhost/replica:0/task:0/device:GPU:1
WARNING:tensorflow:Some requested devices in `tf.distribute.Strategy` are not visible to TensorFlow: /job:localhost/replica:0/task:0/device:GPU:0,/job:localhost/replica:0/task:0/device:GPU:1
Train Images are good to go
[+] Data points in train dataset: 6400
Train Dataset: <PrefetchDataset shapes: ((16, 512, 512, 3), (16, 512, 512, 1)), types: (tf.float32, tf.float32)>
Train Images are good to go
Data points in train dataset: 1600
Val Dataset: <PrefetchDataset shapes: ((16, 512, 512, 3), (16, 512, 512, 1)), types: (tf.float32, tf.float32)>
2021-01-14 13:49:12.045387: I tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing.
2021-01-14 13:49:12.045414: I tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started.
2021-01-14 13:49:12.100790: I tensorflow/core/profiler/lib/profiler_session.cc:158] Profiler session tear down.
2021-01-14 13:49:12.268507: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:656] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Found an unshardable source dataset: name: "TensorSliceDataset/_2"
op: "TensorSliceDataset"
input: "Placeholder/_0"
input: "Placeholder/_1"
attr {
  key: "Toutput_types"
  value {
    list {
      type: DT_STRING
      type: DT_STRING
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
      }
      shape {
      }
    }
  }
}

2021-01-14 13:49:12.362496: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:127] None of the MLIR optimization passes are enabled (registered 2)
2021-01-14 13:49:12.367114: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2300000000 Hz
Epoch 1/100
WARNING:tensorflow:`input_shape` is undefined or non-square, or `rows` is not in [96, 128, 160, 192, 224]. Weights for input shape (224, 224) will be loaded as the default.
WARNING:tensorflow:`input_shape` is undefined or non-square, or `rows` is not in [96, 128, 160, 192, 224]. Weights for input shape (224, 224) will be loaded as the default.
Traceback (most recent call last):
  File "trainer.py", line 47, in <module>
    HISTORY = TRAINER.train()
  File "/data/deeplab/DeepLabV3-Plus/deeplabv3plus/train.py", line 191, in train
    epochs=self.config['epochs'], callbacks=callbacks
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/wandb/integration/keras/keras.py", line 119, in new_v2
    return old_v2(*args, **kwargs)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1135, in fit
    tmp_logs = self.train_function(iterator)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 797, in __call__
    result = self._call(*args, **kwds)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 841, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 695, in _initialize
    *args, **kwds))
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2998, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3390, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3235, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 998, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 603, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 985, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:840 train_function  *
        return step_function(self, iterator)
    /data/deeplab/DeepLabV3-Plus/deeplabv3plus/model/deeplabv3_plus.py:104 call  *
        tensor = tf.keras.layers.Concatenate(axis=-1)([input_a, input_b])
    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:1015 __call__  **
        self._maybe_build(inputs)
    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:2709 _maybe_build
        self.build(input_shapes)  # pylint:disable=not-callable
    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/utils/tf_utils.py:273 wrapper
        output_shape = fn(instance, input_shape)
    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/layers/merge.py:519 build
        raise ValueError(err_msg)

    ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(8, 128, 128, 256), (8, 64, 64, 48)]


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions