Skip to content

训练报错 #114

@alexiycv

Description

@alexiycv

λ f02d1b16ca1e /home/PLSC mkdir -p ./dataset/
λ f02d1b16ca1e /home/PLSC tar -xzf MS1M_v3_One_Sample.tgz -C ./dataset/

λ f02d1b16ca1e /home/PLSC
λ f02d1b16ca1e /home/PLSC python plsc/data/dataset/tools/lfw_style_bin_dataset_converter.py --bin_path ./dataset/MS1M_v3_One_Sample/agedb_30.bin --out_dir ./dataset/MS1M_v3_One_Sample/agedb_30/ --flip_test
convert 6000 pair images.
plsc/data/dataset/tools/lfw_style_bin_dataset_converter.py:66: DeprecationWarning: FLIP_LEFT_RIGHT is deprecated and will be removed in Pillow 10 (2023-07-01). Use Transpose.FLIP_LEFT_RIGHT instead.
img1 = img1.transpose(Image.FLIP_LEFT_RIGHT)
plsc/data/dataset/tools/lfw_style_bin_dataset_converter.py:73: DeprecationWarning: FLIP_LEFT_RIGHT is deprecated and will be removed in Pillow 10 (2023-07-01). Use Transpose.FLIP_LEFT_RIGHT instead.
img2 = img2.transpose(Image.FLIP_LEFT_RIGHT)
convert 6000 pair horizontal flip images.
λ f02d1b16ca1e /home/PLSC export CUDA_VISIBLE_DEVICES=0
λ f02d1b16ca1e /home/PLSC python tools/train.py -c ./plsc/configs/FaceRecognition/IResNet50_MS1MV3OneSample_ArcFace_0.1_1n8c_dp_fp32.yaml
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
/home/PLSC/plsc/data/preprocess/timm_autoaugment.py:38: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
_RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)
/home/PLSC/plsc/data/preprocess/timm_autoaugment.py:38: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
_RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)
[2022/08/16 02:58:22] plsc INFO: DataLoader :
[2022/08/16 02:58:22] plsc INFO: Eval :
[2022/08/16 02:58:22] plsc INFO: dataset :
[2022/08/16 02:58:22] plsc INFO: cls_label_path : ./dataset/MS1M_v3_One_Sample/agedb_30/label.txt
[2022/08/16 02:58:22] plsc INFO: image_root : ./dataset/MS1M_v3_One_Sample/agedb_30
[2022/08/16 02:58:22] plsc INFO: name : FaceVerificationDataset
[2022/08/16 02:58:22] plsc INFO: transform_ops :
[2022/08/16 02:58:22] plsc INFO: DecodeImage :
[2022/08/16 02:58:22] plsc INFO: channel_first : False
[2022/08/16 02:58:22] plsc INFO: to_rgb : True
[2022/08/16 02:58:22] plsc INFO: NormalizeImage :
[2022/08/16 02:58:22] plsc INFO: mean : [0.5, 0.5, 0.5]
[2022/08/16 02:58:22] plsc INFO: order :
[2022/08/16 02:58:22] plsc INFO: scale : 1.0/255.0
[2022/08/16 02:58:22] plsc INFO: std : [0.5, 0.5, 0.5]
[2022/08/16 02:58:22] plsc INFO: ToCHWImage : None
[2022/08/16 02:58:22] plsc INFO: loader :
[2022/08/16 02:58:22] plsc INFO: num_workers : 0
[2022/08/16 02:58:22] plsc INFO: use_shared_memory : True
[2022/08/16 02:58:22] plsc INFO: sampler :
[2022/08/16 02:58:22] plsc INFO: batch_size : 128
[2022/08/16 02:58:22] plsc INFO: drop_last : False
[2022/08/16 02:58:22] plsc INFO: name : BatchSampler
[2022/08/16 02:58:22] plsc INFO: shuffle : False
[2022/08/16 02:58:22] plsc INFO: Train :
[2022/08/16 02:58:22] plsc INFO: dataset :
[2022/08/16 02:58:22] plsc INFO: cls_label_path : ./dataset/MS1M_v3_One_Sample/label.txt
[2022/08/16 02:58:22] plsc INFO: image_root : ./dataset/MS1M_v3_One_Sample/
[2022/08/16 02:58:22] plsc INFO: name : FaceIdentificationDataset
[2022/08/16 02:58:22] plsc INFO: transform_ops :
[2022/08/16 02:58:22] plsc INFO: DecodeImage :
[2022/08/16 02:58:22] plsc INFO: channel_first : False
[2022/08/16 02:58:22] plsc INFO: to_rgb : True
[2022/08/16 02:58:22] plsc INFO: RandFlipImage :
[2022/08/16 02:58:22] plsc INFO: flip_code : 1
[2022/08/16 02:58:22] plsc INFO: NormalizeImage :
[2022/08/16 02:58:22] plsc INFO: mean : [0.5, 0.5, 0.5]
[2022/08/16 02:58:22] plsc INFO: order :
[2022/08/16 02:58:22] plsc INFO: scale : 1.0/255.0
[2022/08/16 02:58:22] plsc INFO: std : [0.5, 0.5, 0.5]
[2022/08/16 02:58:22] plsc INFO: ToCHWImage : None
[2022/08/16 02:58:22] plsc INFO: loader :
[2022/08/16 02:58:22] plsc INFO: num_workers : 8
[2022/08/16 02:58:22] plsc INFO: use_shared_memory : True
[2022/08/16 02:58:22] plsc INFO: sampler :
[2022/08/16 02:58:22] plsc INFO: batch_size : 128
[2022/08/16 02:58:22] plsc INFO: drop_last : False
[2022/08/16 02:58:22] plsc INFO: name : DistributedBatchSampler
[2022/08/16 02:58:22] plsc INFO: shuffle : True
[2022/08/16 02:58:22] plsc INFO: DistributedStrategy :
[2022/08/16 02:58:22] plsc INFO: data_parallel : True
[2022/08/16 02:58:22] plsc INFO: Export :
[2022/08/16 02:58:22] plsc INFO: export_type : onnx
[2022/08/16 02:58:22] plsc INFO: input_shape : ['None', 3, 112, 112]
[2022/08/16 02:58:22] plsc INFO: Global :
[2022/08/16 02:58:22] plsc INFO: accum_steps : 1
[2022/08/16 02:58:22] plsc INFO: checkpoint : None
[2022/08/16 02:58:22] plsc INFO: device : gpu
[2022/08/16 02:58:22] plsc INFO: distributed : False
[2022/08/16 02:58:22] plsc INFO: epochs : 25
[2022/08/16 02:58:22] plsc INFO: eval_during_train : True
[2022/08/16 02:58:22] plsc INFO: eval_func : face_verification_eval
[2022/08/16 02:58:22] plsc INFO: eval_interval : 200
[2022/08/16 02:58:22] plsc INFO: eval_unit : step
[2022/08/16 02:58:22] plsc INFO: max_num_latest_checkpoint : 0
[2022/08/16 02:58:22] plsc INFO: output_dir : ./output/
[2022/08/16 02:58:22] plsc INFO: pretrained_model : None
[2022/08/16 02:58:22] plsc INFO: print_batch_step : 10
[2022/08/16 02:58:22] plsc INFO: rank : 0
[2022/08/16 02:58:22] plsc INFO: save_interval : 1
[2022/08/16 02:58:22] plsc INFO: seed : 2022
[2022/08/16 02:58:22] plsc INFO: task_type : recognition
[2022/08/16 02:58:22] plsc INFO: train_epoch_func : defualt_train_one_epoch
[2022/08/16 02:58:22] plsc INFO: use_visualdl : True
[2022/08/16 02:58:22] plsc INFO: world_size : 1
[2022/08/16 02:58:22] plsc INFO: LRScheduler :
[2022/08/16 02:58:22] plsc INFO: boundaries : [10, 16, 22]
[2022/08/16 02:58:22] plsc INFO: decay_unit : epoch
[2022/08/16 02:58:22] plsc INFO: name : Step
[2022/08/16 02:58:22] plsc INFO: values : [0.2, 0.02, 0.002, 0.0002]
[2022/08/16 02:58:22] plsc INFO: Loss :
[2022/08/16 02:58:22] plsc INFO: Train :
[2022/08/16 02:58:22] plsc INFO: MarginLoss :
[2022/08/16 02:58:22] plsc INFO: m1 : 1.0
[2022/08/16 02:58:22] plsc INFO: m2 : 0.5
[2022/08/16 02:58:22] plsc INFO: m3 : 0.0
[2022/08/16 02:58:22] plsc INFO: model_parallel : False
[2022/08/16 02:58:22] plsc INFO: s : 64.0
[2022/08/16 02:58:22] plsc INFO: weight : 1.0
[2022/08/16 02:58:22] plsc INFO: Metric :
[2022/08/16 02:58:22] plsc INFO: Eval :
[2022/08/16 02:58:22] plsc INFO: LFWAcc :
[2022/08/16 02:58:22] plsc INFO: flip_test : True
[2022/08/16 02:58:22] plsc INFO: Model :
[2022/08/16 02:58:22] plsc INFO: class_num : 93431
[2022/08/16 02:58:22] plsc INFO: data_format : NCHW
[2022/08/16 02:58:22] plsc INFO: name : IResNet50
[2022/08/16 02:58:22] plsc INFO: num_features : 512
[2022/08/16 02:58:22] plsc INFO: pfc_config :
[2022/08/16 02:58:22] plsc INFO: model_parallel : False
[2022/08/16 02:58:22] plsc INFO: sample_ratio : 0.1
[2022/08/16 02:58:22] plsc INFO: Optimizer :
[2022/08/16 02:58:22] plsc INFO: grad_clip :
[2022/08/16 02:58:22] plsc INFO: always_clip : True
[2022/08/16 02:58:22] plsc INFO: clip_norm : 2.0
[2022/08/16 02:58:22] plsc INFO: clip_norm_max : 2.0
[2022/08/16 02:58:22] plsc INFO: name : ClipGradByGlobalNorm
[2022/08/16 02:58:22] plsc INFO: no_clip_list : ['partialfc']
[2022/08/16 02:58:22] plsc INFO: momentum : 0.9
[2022/08/16 02:58:22] plsc INFO: name : Momentum
[2022/08/16 02:58:22] plsc INFO: use_master_param : False
[2022/08/16 02:58:22] plsc INFO: weight_decay : 0.0005
[2022/08/16 02:58:22] plsc INFO: profiler_options : None
[2022/08/16 02:58:22] plsc INFO: train with paddle 2.3.1 and device Place(gpu:0)
[2022/08/16 02:58:22] plsc INFO: Loading dataset ./dataset/MS1M_v3_One_Sample/label.txt
[2022/08/16 02:58:23] plsc INFO: Load dataset finished, 93431 samples
W0816 02:58:23.308667 876 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.2, Runtime API Version: 11.2
W0816 02:58:23.372398 876 gpu_resources.cc:91] device: 0, cuDNN Version: 8.1.
[2022/08/16 02:58:24] plsc INFO: Number of Parameters is 91.43M.
Traceback (most recent call last):
File "tools/train.py", line 34, in
engine = Engine(config, mode="train")
File "/home/PLSC/plsc/engine/engine.py", line 213, in init
self.lr_scheduler, self.model)
File "/home/PLSC/plsc/optimizer/init.py", line 60, in build_optimizer
param_group[key] = get_fused_params(param_group[key])
File "/home/PLSC/plsc/core/param_fuse.py", line 454, in get_fused_params
var_groups = assign_group_by_size(params)
File "/home/PLSC/plsc/core/param_fuse.py", line 391, in assign_group_by_size
parameters, is_sparse_gradient, [group_size, group_size])
ValueError: (InvalidArgument) argument (position 1) must be list of Tensor, but got ParamBase at pos 0 (at /paddle/paddle/fluid/pybind/eager_utils.cc:240)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions