Shortcuts

Transforms

Overview of transforms

We have introduced how to build a Pipeline in add_transforms. A Pipeline contains a series of transforms. There are three main categories of transforms in MMSelfSup:

  1. Transforms about processing the data. The unique transforms in MMSelfSup are defined in processing.py, e.g. RandomCrop, RandomResizedCrop and RandomGaussianBlur. We may also use some transforms from other repositories, e.g. LoadImageFromFile from MMCV.

  2. The transform wrapper for multiple views of an image. It is defined in wrappers.py.

  3. The transform to pack data into a format compatible with the inputs of the algorithm. It is defined in formatting.py.

In summary, we implement these transforms below. The last two transforms will be introduced in detail.

class function
BEiTMaskGenerator Generate mask for image refers to BEiT
SimMIMMaskGenerator Generate random block mask for each Image refers to SimMIM
ColorJitter Randomly change the brightness, contrast, saturation and hue of an image
RandomCrop Crop the given Image at a random location
RandomGaussianBlur GaussianBlur augmentation refers to SimCLR
RandomResizedCrop Crop the given image to random size and aspectratio
RandomResizedCropAndInterpolationWithTwoPic Crop the given PIL Image to random size and aspect ratio with random interpolation
RandomSolarize Solarization augmentation refers to BYOL
RotationWithLabels Rotation prediction
RandomPatchWithLabels Apply random patch augmentation to the given image
RandomRotation Rotate the image by angle
MultiView A wrapper for algorithms with multi-view image inputs
PackSelfSupInputs Pack data into a format compatible with the inputs of an algorithm

Introduction of MultiView

We build a wrapper named MultiView for some algorithms e.g. MOCO, SimCLR and SwAV with multi-view image inputs. In the config file, we can define it as:

pipeline = [
     dict(type='MultiView',
          num_views=2,
          transforms=[
            [dict(type='Resize', scale=224),]
          ])
]

, which means that there are two views in the pipeline.

We can also define pipeline with different views like:

pipeline = [
     dict(type='MultiView',
          num_views=[2, 6],
          transforms=[
            [
              dict(type='Resize', scale=224)],
            [
              dict(type='Resize', scale=224),
              dict(type='RandomSolarize')],
          ])
]

This means that there are two pipelines, which contain 2 views and 6 views, respectively. More examples can be found in imagenet_mocov1.py, imagenet_mocov2.py and imagenet_swav_mcrop-2-6.py etc.

Introduction of PackSelfSupInputs

We build a class named PackSelfSupInputs to pack data into a format compatible with the inputs of an algorithm. This transform is usually put at the end of the pipeline like:

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='MultiView', num_views=2, transforms=[view_pipeline]),
    dict(type='PackSelfSupInputs', meta_keys=['img_path'])
]
Read the Docs v: dev-1.x
Versions
latest
stable
1.x
dev-1.x
dev
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.