Transforms¶
Overview of transforms¶
We have introduced how to build a Pipeline in add_transforms. A Pipeline contains a series of
transforms. There are three main categories of transforms in MMSelfSup:
Transforms about processing the data. The unique transforms in MMSelfSup are defined in processing.py, e.g.
RandomCrop,RandomResizedCropandRandomGaussianBlur. We may also use some transforms from other repositories, e.g.LoadImageFromFilefrom MMCV.The transform wrapper for multiple views of an image. It is defined in wrappers.py.
The transform to pack data into a format compatible with the inputs of the algorithm. It is defined in formatting.py.
In summary, we implement these transforms below. The last two transforms will be introduced in detail.
| class | function |
|---|---|
BEiTMaskGenerator |
Generate mask for image refers to BEiT |
SimMIMMaskGenerator |
Generate random block mask for each Image refers to SimMIM |
ColorJitter |
Randomly change the brightness, contrast, saturation and hue of an image |
RandomCrop |
Crop the given Image at a random location |
RandomGaussianBlur |
GaussianBlur augmentation refers to SimCLR |
RandomResizedCrop |
Crop the given image to random size and aspectratio |
RandomResizedCropAndInterpolationWithTwoPic |
Crop the given PIL Image to random size and aspect ratio with random interpolation |
RandomSolarize |
Solarization augmentation refers to BYOL |
RotationWithLabels |
Rotation prediction |
RandomPatchWithLabels |
Apply random patch augmentation to the given image |
RandomRotation |
Rotate the image by angle |
MultiView |
A wrapper for algorithms with multi-view image inputs |
PackSelfSupInputs |
Pack data into a format compatible with the inputs of an algorithm |
Introduction of MultiView¶
We build a wrapper named MultiView for some algorithms e.g. MOCO, SimCLR and SwAV with multi-view image inputs. In the config file, we can
define it as:
pipeline = [
dict(type='MultiView',
num_views=2,
transforms=[
[dict(type='Resize', scale=224),]
])
]
, which means that there are two views in the pipeline.
We can also define pipeline with different views like:
pipeline = [
dict(type='MultiView',
num_views=[2, 6],
transforms=[
[
dict(type='Resize', scale=224)],
[
dict(type='Resize', scale=224),
dict(type='RandomSolarize')],
])
]
This means that there are two pipelines, which contain 2 views and 6 views, respectively. More examples can be found in imagenet_mocov1.py, imagenet_mocov2.py and imagenet_swav_mcrop-2-6.py etc.
Introduction of PackSelfSupInputs¶
We build a class named PackSelfSupInputs to pack data into a format compatible with the inputs of an algorithm. This transform
is usually put at the end of the pipeline like:
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='MultiView', num_views=2, transforms=[view_pipeline]),
dict(type='PackSelfSupInputs', meta_keys=['img_path'])
]