Transforms¶

Transforms

Overview of transforms¶

We have introduced how to build a Pipeline in add_transforms. A Pipeline contains a series of transforms. There are three main categories of transforms in MMSelfSup:

Transforms about processing the data. The unique transforms in MMSelfSup are defined in processing.py, e.g. RandomCrop, RandomResizedCrop and RandomGaussianBlur. We may also use some transforms from other repositories, e.g. LoadImageFromFile from MMCV.
The transform wrapper for multiple views of an image. It is defined in wrappers.py.
The transform to pack data into a format compatible with the inputs of the algorithm. It is defined in formatting.py.

In summary, we implement these transforms below. The last two transforms will be introduced in detail.

class	function
`BEiTMaskGenerator`	Generate mask for image refers to `BEiT`
`SimMIMMaskGenerator`	Generate random block mask for each Image refers to `SimMIM`
`ColorJitter`	Randomly change the brightness, contrast, saturation and hue of an image
`RandomCrop`	Crop the given Image at a random location
`RandomGaussianBlur`	GaussianBlur augmentation refers to `SimCLR`
`RandomResizedCrop`	Crop the given image to random size and aspectratio
`RandomResizedCropAndInterpolationWithTwoPic`	Crop the given PIL Image to random size and aspect ratio with random interpolation
`RandomSolarize`	Solarization augmentation refers to `BYOL`
`RotationWithLabels`	Rotation prediction
`RandomPatchWithLabels`	Apply random patch augmentation to the given image
`RandomRotation`	Rotate the image by angle
`MultiView`	A wrapper for algorithms with multi-view image inputs
`PackSelfSupInputs`	Pack data into a format compatible with the inputs of an algorithm

Introduction of `MultiView`¶

We build a wrapper named MultiView for some algorithms e.g. MOCO, SimCLR and SwAV with multi-view image inputs. In the config file, we can define it as:

pipeline = [
     dict(type='MultiView',
          num_views=2,
          transforms=[
            [dict(type='Resize', scale=224),]
          ])
]

, which means that there are two views in the pipeline.

We can also define pipeline with different views like:

pipeline = [
     dict(type='MultiView',
          num_views=[2, 6],
          transforms=[
            [
              dict(type='Resize', scale=224)],
            [
              dict(type='Resize', scale=224),
              dict(type='RandomSolarize')],
          ])
]

This means that there are two pipelines, which contain 2 views and 6 views, respectively. More examples can be found in imagenet_mocov1.py, imagenet_mocov2.py and imagenet_swav_mcrop-2-6.py etc.

Introduction of `PackSelfSupInputs`¶

We build a class named PackSelfSupInputs to pack data into a format compatible with the inputs of an algorithm. This transform is usually put at the end of the pipeline like:

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='MultiView', num_views=2, transforms=[view_pipeline]),
    dict(type='PackSelfSupInputs', meta_keys=['img_path'])
]

Transforms¶

Overview of transforms¶

Introduction of MultiView¶

Introduction of PackSelfSupInputs¶

Introduction of `MultiView`¶

Introduction of `PackSelfSupInputs`¶