Skip to content

Process

Vladimir Mandic edited this page Feb 14, 2023 · 5 revisions

DataSet Preprocessing

Dataset preprocessing is a highly critical and complex operation required for any succesful training.
This repository includes end-to-end processing solution in cli/modules/process.py script.

Note that processing can result in no or multiple images for each input as it performs optional steps:

  • Extract images frames from video files
  • Extract faces from images
  • Extract body from images
  • Keep original image as well as extracted faces and bodies

Additionaly for each processed image it can:

  • For extracted faces
    • Run upscaling to improve resolution
    • Run face restoration to improve quality
    • Remove background from images containing extracted faces
  • Verify if all extracted images are of sufficient quality:
    • Meet minimum resolution
    • Meet face visibility requirement
    • Meet framing requirement (e.g. head cut off)
    • Run similarity checks to remove near-duplicates
    • Run brightness dynamic range checks to remove images with low contrast
    • Run blur detection to remove blurry images
  • Create captions and tags from images using multiple interrogate models

Processing can be used manually using cli/modules/process.py script or as part of all included training solutions

Currently to adjust processing parameters you need to edit cli/modules/process.py script

params = Map({
    # general settings
    'clear_dst': True, # remove all files from destination at the start
    'format': '.jpg', # image format
    'target_size': 512, # target resolution
    'square_images': True, # should output images be squared
    'segmentation_model': 0, # segmentation model 0/general 1/landscape
    'segmentation_background': (192, 192, 192), # segmentation background color
    'blur_samplesize': 60, # sample size to use for blur detection
    'similarity_size': 64, # base similarity detection on reduced images
    # original image processing settings
    'keep_original': True, # keep original image
    # face processing settings
    'extract_face': False, # extract face from image
    'face_score': 0.7, # min face detection score
    'face_pad': 0.2, # pad face image percentage
    'face_model': 1, # which face model to use 0/close-up 1/standard
    'face_blur_score': 1.5, # max score for face blur detection
    'face_range_score': 0.15, # min score for face dynamic range detection
    'face_restore': True, # attempt to restore face quality
    'face_upscale': True, # attempt to scale small faces
    'face_segmentation': False, # segmentation enabled
    # body processing settings
    'extract_body': False, # extract face from image
    'body_score': 0.9, # min body detection score
    'body_visibility': 0.5, # min visibility score for each detected body part
    'body_parts': 15, # min number of detected body parts with sufficient visibility
    'body_pad': 0.2,  # pad body image percentage
    'body_model': 2, # body model to use 0/low 1/medium 2/high
    'body_blur_score': 1.8, # max score for body blur detection
    'body_range_score': 0.15, # min score for body dynamic range detection
    'body_segmentation': False, # segmentation enabled
    # similarity detection settings
    'similarity_score': 0.8, # maximum similarity score before image is discarded
    # interrogate settings
    'interrogate_model': ['clip', 'deepdanbooru'], # interrogate models
    'interrogate_captions': True, # write captions to file
    'tag_limit': 5, # number of tags to extract
})
Clone this wiki locally