Shortcuts

mmselfsup.datasets

datasets

class mmselfsup.datasets.DeepClusterImageNet(ann_file: str = '', metainfo: Optional[dict] = None, data_root: str = '', data_prefix: Union[str, dict] = '', **kwargs)[源代码]

ImageNet Dataset.

The dataset inherit ImageNet dataset from MMClassification as the DeepCluster and Online Deep Clustering algorithm need to initialize clustering labels and assign them during training.

参数
  • ann_file (str) – Annotation file path. Defaults to None.

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Defaults to None.

  • data_root (str) – The root directory for data_prefix and ann_file. Defaults to None.

  • data_prefix (str | dict) – Prefix for training data. Defaults to None.

  • **kwargs – Other keyword arguments in CustomDataset and BaseDataset.

assign_labels(labels: list)None[源代码]

Assign new labels to self.clustering_labels.

参数

labels (list) – The new labels.

返回

None

prepare_data(idx: int)Any[源代码]

Get data processed by self.pipeline.

参数

idx (int) – The index of data_info.

返回

Depends on self.pipeline.

返回类型

Any

class mmselfsup.datasets.ImageList(ann_file: str, metainfo: Optional[dict] = None, data_root: str = '', data_prefix: Union[str, dict] = '', **kwargs)[源代码]

The dataset implementation for loading any image list file.

The ImageList can load an annotation file or a list of files and merge all data records to one list. If data is unlabeled, the gt_label will be set -1.

An annotation file should be provided, and each line indicates a sample:

The sample files:

data_prefix/
├── folder_1
│   ├── xxx.png
│   ├── xxy.png
│   └── ...
└── folder_2
    ├── 123.png
    ├── nsdf3.png
    └── ...

1. If data is labeled, the annotation file (the first column is the image path and the second column is the index of category):

    folder_1/xxx.png 0
    folder_1/xxy.png 1
    folder_2/123.png 5
    folder_2/nsdf3.png 3
    ...

2. If data is unlabeled, the annotation file is: ::

    folder_1/xxx.png
    folder_1/xxy.png
    folder_2/123.png
    folder_2/nsdf3.png
    ...
参数
  • ann_file (str) – Annotation file path.

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Defaults to None.

  • data_root (str) – The root directory for data_prefix and ann_file. Defaults to None.

  • data_prefix (str | dict) – Prefix for training data. Defaults to None.

  • **kwargs – Other keyword arguments in CustomDataset and BaseDataset.

load_data_list()List[dict][源代码]

Rewrite load_data_list() function for supporting annotation files with unlabeled data.

返回

A list of data information.

返回类型

List[dict]

class mmselfsup.datasets.Places205(ann_file: str = '', metainfo: Optional[dict] = None, data_root: str = '', data_prefix: Union[str, dict] = '', **kwargs)[源代码]

Places205 Dataset.

The dataset supports two kinds of annotation format. More details can be found in CustomDataset.

参数
  • ann_file (str) – Annotation file path. Defaults to None.

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Defaults to None.

  • data_root (str) – The root directory for data_prefix and ann_file. Defaults to None.

  • data_prefix (str | dict) – Prefix for training data. Defaults to None.

  • **kwargs – Other keyword arguments in CustomDataset and BaseDataset.

mmselfsup.datasets.build_dataset(cfg)[源代码]

Build dataset.

transforms

class mmselfsup.datasets.transforms.BEiTMaskGenerator(input_size: int, num_masking_patches: int, min_num_patches: int = 4, max_num_patches: Optional[int] = None, min_aspect: float = 0.3, max_aspect: Optional[float] = None)[源代码]

Generate mask for image.

Added Keys:

  • mask

This module is borrowed from https://github.com/microsoft/unilm/tree/master/beit

参数
  • input_size (int) – The size of input image.

  • num_masking_patches (int) – The number of patches to be masked.

  • min_num_patches (int) – The minimum number of patches to be masked in the process of generating mask. Defaults to 4.

  • max_num_patches (int, optional) – The maximum number of patches to be masked in the process of generating mask. Defaults to None.

  • min_aspect (float, optional) – The minimum aspect ratio of mask blocks. Defaults to 0.3.

  • min_aspect – The minimum aspect ratio of mask blocks. Defaults to None.

get_shape()Tuple[int, int][源代码]

Get the shape of mask.

返回

The shape of mask.

返回类型

Tuple[int, int]

transform(results: dict)dict[源代码]

Method to generate random block mask for each Image in BEiT.

参数

results (dict) – Result dict from previous pipeline.

返回

Result dict with added key mask.

返回类型

dict

class mmselfsup.datasets.transforms.ColorJitter(brightness: Union[float, List[float]] = 0, contrast: Union[float, List[float]] = 0, saturation: Union[float, List[float]] = 0, hue: Union[float, List[float]] = 0, backend: str = 'pillow')[源代码]

Randomly change the brightness, contrast, saturation and hue of an image.

Modified from https://github.com/pytorch/vision/blob/main/torchvision/transforms/transforms.py

Required Keys:

  • img

Modified Keys:

  • img

参数
  • brightness (float or tuple of float (min, max)) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers.

  • contrast (float or tuple of float (min, max)) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers.

  • saturation (float or tuple of float (min, max)) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers.

  • hue (float or tuple of float (min, max)) – How much to jitter hue. hue_factor is chosen uniformly from [-hue, hue] or the given [min, max]. Should have 0 <= hue <= 0.5 or -0.5 <= min <= max <= 0.5. To jitter hue, the pixel values of the input image has to be non-negative for conversion to HSV space; thus it does not work if you normalize your image to an interval with negative values, or use an interpolation that generates negative values before using this function.

  • backend (str) – The type of image processing backend. Options are cv2, pillow. Defaults to pillow.

static get_params(brightness: Optional[List[float]], contrast: Optional[List[float]], saturation: Optional[List[float]], hue: Optional[List[float]])Tuple[numpy.ndarray, Optional[float], Optional[float], Optional[float], Optional[float]][源代码]

Get the parameters for the randomized transform to be applied on image.

参数
  • brightness (tuple of float (min, max), optional) – The range from which the brightness_factor is chosen uniformly. Pass None to turn off the transformation.

  • contrast (tuple of float (min, max), optional) – The range from which the contrast_factor is chosen uniformly. Pass None to turn off the transformation.

  • saturation (tuple of float (min, max), optional) – The range from which the saturation_factor is chosen uniformly. Pass None to turn off the transformation.

  • hue (tuple of float (min, max), optional) – The range from which the hue_factor is chosen uniformly. Pass None to turn off the transformation.

返回

The parameters used to apply the randomized transform

along with their random order.

返回类型

tuple

transform(results: dict)dict[源代码]

Randomly change the brightness, contrast, saturation and hue of an image. # noqa: E501.

参数

results (dict) – The results dict from previous pipeline.

返回

Results after applying this transformation.

返回类型

dict

class mmselfsup.datasets.transforms.MAERandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=<InterpolationMode.BILINEAR: 'bilinear'>, antialias: Optional[bool] = None)[源代码]

RandomResizedCrop for matching TF/TPU implementation: no for-loop is used.

This may lead to results different with torchvision’s version. Following BYOL’s TF code: https://github.com/deepmind/deepmind-research/blob/master/byol/utils/dataset.py#L206 # noqa: E501

forward(results: dict)dict[源代码]

The forward function of MAERandomResizedCrop.

参数

results (dict) – The results dict contains the image and all these information related to the image.

返回

The results dict contains the cropped image and all these information related to the image.

返回类型

dict

static get_params(img: PIL.Image.Image, scale: tuple, ratio: tuple)Tuple[源代码]

Get parameters for crop for a random sized crop.

参数
  • img (PIL Image or Tensor) – Input image.

  • scale (list) – range of scale of the origin size cropped

  • ratio (list) – range of aspect ratio of the origin aspect ratio cropped

返回

params (i, j, h, w) to be passed to crop for a random sized crop.

返回类型

tuple

class mmselfsup.datasets.transforms.MultiView(transforms: List[List[Union[dict, Callable[[dict], dict]]]], num_views: Union[int, List[int]])[源代码]

A transform wrapper for multiple views of an image.

参数
  • transforms (list[dict | callable], optional) – Sequence of transform object or config dict to be wrapped.

  • mapping (dict) – A dict that defines the input key mapping. The keys corresponds to the inner key (i.e., kwargs of the transform method), and should be string type. The values corresponds to the outer keys (i.e., the keys of the data/results), and should have a type of string, list or dict. None means not applying input mapping. Default: None.

  • allow_nonexist_keys (bool) – If False, the outer keys in the mapping must exist in the input data, or an exception will be raised. Default: False.

实际案例

>>> # Example 1: MultiViews 1 pipeline with 2 views
>>> pipeline = [
>>>     dict(type='MultiView',
>>>         num_views=2,
>>>         transforms=[
>>>             [
>>>                dict(type='Resize', scale=224))],
>>>         ])
>>> ]
>>> # Example 2: MultiViews 2 pipelines, the first with 2 views,
>>> # the second with 6 views
>>> pipeline = [
>>>     dict(type='MultiView',
>>>         num_views=[2, 6],
>>>         transforms=[
>>>             [
>>>                dict(type='Resize', scale=224)],
>>>             [
>>>                dict(type='Resize', scale=224),
>>>                dict(type='RandomSolarize')],
>>>         ])
>>> ]
transform(results: dict)dict[源代码]

Apply transformation to inputs.

参数

results (dict) – Result dict from previous pipelines.

返回

Transformed results.

返回类型

dict

class mmselfsup.datasets.transforms.PackSelfSupInputs(key: str = 'img', algorithm_keys: List[str] = [], pseudo_label_keys: List[str] = [], meta_keys: List[str] = [])[源代码]

Pack data into the format compatible with the inputs of algorithm.

Required Keys:

  • img

Added Keys:

  • data_samples

  • inputs

参数
  • key (str) – The key of image inputted into the model. Defaults to ‘img’.

  • algorithm_keys (List[str]) – Keys of elements related to algorithms, e.g. mask. Defaults to [].

  • pseudo_label_keys (List[str]) – Keys set to be the attributes of pseudo_label. Defaults to [].

  • meta_keys (List[str]) – The keys of meta info of an image. Defaults to [].

classmethod set_algorithm_keys(data_sample: mmselfsup.structures.selfsup_data_sample.SelfSupDataSample, key: str, results: dict)None[源代码]

Set the algorithm keys of SelfSupDataSample.

参数
  • data_sample (SelfSupDataSample) – An instance of SelfSupDataSample.

  • key (str) – The key, which may be used by the algorithm, such as gt_label, sample_idx, mask, pred_label. For more keys, please refer to the attribute of SelfSupDataSample.

  • results (dict) – The results from the data pipeline.

transform(results: Dict)Dict[torch.Tensor, mmselfsup.structures.selfsup_data_sample.SelfSupDataSample][源代码]

Method to pack the data.

参数

results (Dict) – Result dict from the data pipeline.

返回

  • inputs (List[torch.Tensor]): The forward data of models.

  • data_samples (SelfSupDataSample): The annotation info of the forward data.

返回类型

Dict

class mmselfsup.datasets.transforms.RandomCrop(size: Union[int, Sequence[int]], padding: Optional[Union[int, Sequence[int]]] = None, pad_if_needed: bool = False, pad_val: Union[numbers.Number, Sequence[numbers.Number]] = 0, padding_mode: str = 'constant')[源代码]

Crop the given Image at a random location.

Required Keys:

  • img

Modified Keys:

  • img

  • img_shape

参数
  • size (int or Sequence) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.

  • padding (int or Sequence, optional) – Optional padding on each border of the image. If a sequence of length 4 is provided, it is used to pad left, top, right, bottom borders respectively. If a sequence of length 2 is provided, it is used to pad left/right, top/bottom borders, respectively. Default: None, which means no padding.

  • pad_if_needed (boolean) – It will pad the image if smaller than the desired size to avoid raising an exception. Since cropping is done after padding, the padding seems to be done at a random offset. Default: False.

  • pad_val (Number | Sequence[Number]) – Pixel pad_val value for constant fill. If a tuple of length 3, it is used to pad_val R, G, B channels respectively. Default: 0.

  • padding_mode (str) –

    Type of padding. Defaults to “constant”. Should be one of the following:

    • constant: Pads with a constant value, this value is specified with pad_val.

    • edge: pads with the last value at the edge of the image.

    • reflect: Pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].

    • symmetric: Pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3].

static get_params(img: numpy.ndarray, output_size: Tuple)Tuple[源代码]

Get parameters for crop for a random crop.

参数
  • img (np.ndarray) – Image to be cropped.

  • output_size (Tuple) – Expected output size of the crop.

返回

Params (xmin, ymin, target_height, target_width) to be

passed to crop for random crop.

返回类型

tuple

transform(results: dict)dict[源代码]

Randomly crop the image.

参数

results (dict) – Result dict from previous pipeline.

返回

Result dict with the transformed image.

返回类型

dict

class mmselfsup.datasets.transforms.RandomGaussianBlur(sigma_min: float, sigma_max: float, prob: Optional[float] = 0.5)[源代码]

GaussianBlur augmentation refers to SimCLR.

Paper link.

Required Keys:

  • img

Modified Keys:

  • img

参数
  • sigma_min (float) – The minimum parameter of Gaussian kernel std.

  • sigma_max (float) – The maximum parameter of Gaussian kernel std.

  • prob (float, optional) – Probability. Defaults to 0.5.

transform(results: dict)dict[源代码]

Apply GaussianBlur augmentation to the given image.

参数

results (dict) – Results from previous pipeline.

返回

Results after applying this transformation.

返回类型

dict

class mmselfsup.datasets.transforms.RandomPatchWithLabels[源代码]

Relative patch location.

Required Keys:

  • img

Modified Keys:

  • img

Added Keys:

  • patch_label

  • patch_box

  • unpatched_img

Crops image into several patches and concatenates every surrounding patch with center one. Finally gives labels 0, 1, 2, 3, 4, 5, 6, 7 and patch positions.

transform(results: dict)dict[源代码]

Apply random patch augmentation to the given image.

参数

results (dict) – Results from previous pipeline.

返回

Results after applying this transformation.

返回类型

dict

class mmselfsup.datasets.transforms.RandomResizedCrop(size: Union[int, Sequence[int]], scale: Tuple = (0.08, 1.0), ratio: Tuple = (0.75, 1.3333333333333333), max_attempts: int = 10, interpolation: str = 'bilinear', backend: str = 'cv2')[源代码]

Crop the given image to random size and aspect ratio.

A crop of random size (default: of 0.08 to 1.0) of the original size and a random aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop is finally resized to given size.

Required Keys:

  • img

Modified Keys:

  • img

  • img_shape

参数
  • size (Sequence | int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.

  • scale (Tuple) – Range of the random size of the cropped image compared to the original image. Defaults to (0.08, 1.0).

  • ratio (Tuple) – Range of the random aspect ratio of the cropped image compared to the original image. Defaults to (3. / 4., 4. / 3.).

  • max_attempts (int) – Maximum number of attempts before falling back to Central Crop. Defaults to 10.

  • interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bilinear’.

  • backend (str) – The image resize backend type, accepted values are cv2 and pillow. Defaults to cv2.

static get_params(img: numpy.ndarray, scale: Tuple, ratio: Tuple, max_attempts: int = 10)Tuple[int, int, int, int][源代码]

Get parameters for crop for a random sized crop.

参数
  • img (np.ndarray) – Image to be cropped.

  • scale (Tuple) – Range of the random size of the cropped image compared to the original image size.

  • ratio (Tuple) – Range of the random aspect ratio of the cropped image compared to the original image area.

  • max_attempts (int) – Maximum number of attempts before falling back to central crop. Defaults to 10.

返回

Params (ymin, xmin, ymax, xmax) to be passed to crop for

a random sized crop.

返回类型

tuple

transform(results: dict)dict[源代码]

Randomly crop the image and resize the image to the target size.

参数

results (dict) – Result dict from previous pipeline.

返回

Result dict with the transformed image.

返回类型

dict

class mmselfsup.datasets.transforms.RandomResizedCropAndInterpolationWithTwoPic(size: Union[tuple, int], second_size=None, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation='bilinear', second_interpolation='lanczos')[源代码]

Crop the given PIL Image to random size and aspect ratio with random interpolation.

Required Keys:

  • img

Modified Keys:

  • img

Added Keys:

  • target_img

This module is borrowed from https://github.com/microsoft/unilm/tree/master/beit.

A crop of random size (default: of 0.08 to 1.0) of the original size and a random aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop is finally resized to given size. This is popularly used to train the Inception networks. This module first crops the image and resizes the crop to two different sizes.

参数
  • size (Union[tuple, int]) – Expected output size of each edge of the first image.

  • second_size (Union[tuple, int], optional) – Expected output size of each edge of the second image.

  • scale (tuple[float, float]) – Range of size of the origin size cropped. Defaults to (0.08, 1.0).

  • ratio (tuple[float, float]) – Range of aspect ratio of the origin aspect ratio cropped. Defaults to (3./4., 4./3.).

  • interpolation (str) – The interpolation for the first image. Defaults to bilinear.

  • second_interpolation (str) – The interpolation for the second image. Defaults to lanczos.

static get_params(img: numpy.ndarray, scale: tuple, ratio: tuple)Sequence[int][源代码]

Get parameters for crop for a random sized crop.

参数
  • img (np.ndarray) – Image to be cropped.

  • scale (tuple) – range of size of the origin size cropped

  • ratio (tuple) – range of aspect ratio of the origin aspect ratio cropped

返回

params (i, j, h, w) to be passed to crop for a random

sized crop.

返回类型

tuple

transform(results: dict)dict[源代码]

Crop the given image and resize it to two different sizes.

This module crops the given image randomly and resize the crop to two different sizes. This is popularly used in BEiT-style masked image modeling, where an off-the-shelf model is used to provide the target.

参数

results (dict) – Results from previous pipeline.

返回

Results after applying this transformation.

返回类型

dict

class mmselfsup.datasets.transforms.RandomRotation(degrees: Union[int, Sequence[int]], interpolation: str = 'nearest', expand: bool = False, center: Optional[Tuple[float]] = None, fill: int = 0)[源代码]

Rotate the image by angle.

Required Keys:

  • img

Modified Keys:

  • img

参数
  • degrees (sequence | int) – Range of degrees to select from. If degrees is an int instead of sequence like (min, max), the range of degrees will be (-degrees, +degrees).

  • interpolation (str, optional) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘nearest’.

  • expand (bool, optional) – Optional expansion flag. If true, expands the output to make it large enough to hold the entire rotated image. If false or omitted, make the output image the same size as the input image. Note that the expand flag assumes rotation around the center and no translation. Defaults to False.

  • center (Tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used. Defaults to None.

  • fill (int, optional) – Pixel fill value for the area outside the rotated image. Default to 0.

static get_params(degrees: List[float])float[源代码]

Get parameters for rotate for a random rotation.

参数

degrees (List[float]) – Range of degrees to select from.

返回

angle parameter to be passed to rotate for

random rotation.

返回类型

float

transform(results: dict)dict[源代码]

Randomly rotate the image.

参数

results (dict) – Result dict from previous pipeline.

返回

Result dict with the transformed image.

返回类型

dict

class mmselfsup.datasets.transforms.RandomSolarize(threshold: int = 128, prob: float = 0.5)[源代码]

Solarization augmentation refers to BYOL.

Paper link.

Required Keys:

  • img

Modified Keys:

  • img

参数
  • threshold (float, optional) – The solarization threshold. Defaults to 128.

  • prob (float, optional) – Probability. Defaults to 0.5.

transform(results: dict)dict[源代码]

Apply Solarize augmentation to the given image.

参数

results (dict) – Results from previous pipeline.

返回

Results after applying this transformation.

返回类型

dict

class mmselfsup.datasets.transforms.RotationWithLabels[源代码]

Rotation prediction.

Required Keys:

  • img

Modified Keys:

  • img

Added Keys:

  • rot_label

Rotate each image with 0, 90, 180, and 270 degrees and give labels 0, 1, 2, 3 correspodingly.

transform(results: dict)dict[源代码]

Apply rotation augmentation to the given image.

参数

results (dict) – Results from previous pipeline.

返回

Results after applying this transformation.

返回类型

dict

class mmselfsup.datasets.transforms.SimMIMMaskGenerator(input_size: int = 192, mask_patch_size: int = 32, model_patch_size: int = 4, mask_ratio: float = 0.6)[源代码]

Generate random block mask for each Image.

Added Keys:

  • mask

This module is used in SimMIM to generate masks.

参数
  • input_size (int) – Size of input image. Defaults to 192.

  • mask_patch_size (int) – Size of each block mask. Defaults to 32.

  • model_patch_size (int) – Patch size of each token. Defaults to 4.

  • mask_ratio (float) – The mask ratio of image. Defaults to 0.6.

transform(results: dict)dict[源代码]

Method to generate random block mask for each Image in SimMIM.

参数

results (dict) – Result dict from previous pipeline.

返回

Result dict with added key mask.

返回类型

dict

samplers

class mmselfsup.datasets.samplers.DeepClusterSampler(dataset: Sized, shuffle: bool = True, seed: Optional[int] = None, replace: bool = False, round_up: bool = True)[源代码]

The sampler inherits DefaultSampler from mmengine.

This sampler supports to set replace to be True to get indices. Besides, it defines function set_uniform_indices, which is applied in DeepClusterHook.

参数
  • dataset (Sized) – The dataset.

  • shuffle (bool) – Whether shuffle the dataset or not. Defaults to True.

  • seed (int, optional) – Random seed used to shuffle the sampler if shuffle=True. This number should be identical across all processes in the distributed group. Defaults to None.

  • replace (bool) – Replace or not in random shuffle. It works on when shuffle is True. Defaults to False.

  • round_up (bool) – Whether to add extra samples to make the number of samples evenly divisible by the world size. Defaults to True.

set_uniform_indices(labels: list, num_classes: int)None[源代码]

The function is applied in DeepClusterHook for uniform sampling.

参数
  • labels (list) – The updated labels after clustering.

  • num_classes (int) – number of clusters.

返回

None

mmselfsup.engine

hooks

class mmselfsup.engine.hooks.DeepClusterHook(extract_dataloader: dict, clustering: dict, unif_sampling: bool, reweight: bool, reweight_pow: float, init_memory: bool = False, initial: bool = True, interval: int = 1, seed: Optional[int] = None)[源代码]

Hook for DeepCluster.

This hook includes the global clustering process in DC.

参数
  • extractor (dict) – Config dict for feature extraction.

  • clustering (dict) – Config dict that specifies the clustering algorithm.

  • unif_sampling (bool) – Whether to apply uniform sampling.

  • reweight (bool) – Whether to apply loss re-weighting.

  • reweight_pow (float) – The power of re-weighting.

  • init_memory (bool) – Whether to initialize memory banks used in ODC. Defaults to False.

  • initial (bool) – Whether to call the hook initially. Defaults to True.

  • interval (int) – Frequency of epochs to call the hook. Defaults to 1.

  • seed (int, optional) – Random seed. Defaults to None.

after_train_epoch(runner)None[源代码]

Run cluster after indicated epoch.

before_train(runner)None[源代码]

Run cluster before training.

deepcluster(runner)None[源代码]

Call cluster algorithm.

evaluate(runner, new_labels: numpy.ndarray)None[源代码]

Evaluate with labels histogram.

set_reweight(runner, labels: numpy.ndarray, reweight_pow: float = 0.5)[源代码]

Loss re-weighting.

Re-weighting the loss according to the number of samples in each class.

参数
  • runner (mmengine.Runner) – mmengine Runner.

  • labels (numpy.ndarray) – Label assignments.

  • reweight_pow (float, optional) – The power of re-weighting. Defaults to 0.5.

class mmselfsup.engine.hooks.DenseCLHook(start_iters: int = 1000)[源代码]

Hook for DenseCL.

This hook includes loss_lambda warmup in DenseCL. Borrowed from the authors’ code: https://github.com/WXinlong/DenseCL.

参数

start_iters (int) – The number of warmup iterations to set loss_lambda=0. Defaults to 1000.

before_train(runner)None[源代码]

Obtain loss_lambda from algorithm.

before_train_iter(runner, batch_idx: int, data_batch: Optional[Sequence[dict]] = None)None[源代码]

Adjust loss_lambda every train iter.

class mmselfsup.engine.hooks.ODCHook(centroids_update_interval: int, deal_with_small_clusters_interval: int, evaluate_interval: int, reweight: bool, reweight_pow: float, dist_mode: bool = True)[源代码]

Hook for ODC.

This hook includes the online clustering process in ODC.

参数
  • centroids_update_interval (int) – Frequency of iterations to update centroids.

  • deal_with_small_clusters_interval (int) – Frequency of iterations to deal with small clusters.

  • evaluate_interval (int) – Frequency of iterations to evaluate clusters.

  • reweight (bool) – Whether to perform loss re-weighting.

  • reweight_pow (float) – The power of re-weighting.

  • dist_mode (bool) – Use distributed training or not. Defaults to True.

after_train_epoch(runner)None[源代码]

Save cluster.

after_train_iter(runner, batch_idx: int, data_batch: Optional[Sequence[dict]] = None, outputs: Optional[dict] = None)None[源代码]

Update cluster centroids and the loss_weight.

evaluate(runner, new_labels: numpy.ndarray)None[源代码]

Evaluate with labels histogram.

set_reweight(runner, labels: Optional[numpy.ndarray] = None, reweight_pow: float = 0.5)[源代码]

Loss re-weighting.

Re-weighting the loss according to the number of samples in each class.

参数
  • runner (mmengine.Runner) – mmengine Runner.

  • labels (numpy.ndarray) – Label assignments.

  • reweight_pow (float, optional) – The power of re-weighting. Defaults to 0.5.

class mmselfsup.engine.hooks.SimSiamHook(fix_pred_lr: bool, lr: float, adjust_by_epoch: Optional[bool] = True)[源代码]

Hook for SimSiam.

This hook is for SimSiam to fix learning rate of predictor.

参数
  • fix_pred_lr (bool) – whether to fix the lr of predictor or not.

  • lr (float) – the value of fixed lr.

  • adjust_by_epoch (bool, optional) – whether to set lr by epoch or iter. Defaults to True.

before_train_epoch(runner)None[源代码]

fix lr of predictor by epoch.

before_train_iter(runner, batch_idx: int, data_batch: Optional[Sequence[dict]] = None)None[源代码]

fix lr of predictor by iter.

class mmselfsup.engine.hooks.SwAVHook(batch_size: int, epoch_queue_starts: Optional[int] = 15, crops_for_assign: Optional[List[int]] = [0, 1], feat_dim: Optional[int] = 128, queue_length: Optional[int] = 0, interval: Optional[int] = 1, frozen_layers_cfg: Optional[Dict] = {})[源代码]

Hook for SwAV.

This hook builds the queue in SwAV according to epoch_queue_starts. The queue will be saved in runner.work_dir or loaded at start epoch if the path folder has queues saved before.

参数
  • batch_size (int) – the batch size per GPU for computing.

  • epoch_queue_starts (int, optional) – from this epoch, starts to use the queue. Defaults to 15.

  • crops_for_assign (list[int], optional) – list of crops id used for computing assignments. Defaults to [0, 1].

  • feat_dim (int, optional) – feature dimension of output vector. Defaults to 128.

  • queue_length (int, optional) – length of the queue (0 for no queue). Defaults to 0.

  • interval (int, optional) – the interval to save the queue. Defaults to 1.

  • frozen_layers_cfg (dict, optional) – Dict to config frozen layers. The key-value pair is layer name and its frozen iters. If frozen, the layers don’t need gradient. Defaults to dict().

after_train_epoch(runner)None[源代码]

Save the queues locally.

before_run(runner)None[源代码]

Check whether the queues exist locally or not.

before_train_epoch(runner)None[源代码]

Check the queues’ state.

before_train_iter(runner, batch_idx: int, data_batch: Optional[Sequence[dict]] = None)None[源代码]

Freeze layers before specific iters according to the config.

optimizers

class mmselfsup.engine.optimizers.LARS(params: Iterable, lr: float, momentum: float = 0, weight_decay: float = 0, dampening: float = 0, eta: float = 0.001, nesterov: bool = False, eps: float = 1e-08)[源代码]

Implements layer-wise adaptive rate scaling for SGD.

Based on Algorithm 1 of the following paper by You, Gitman, and Ginsburg. Large Batch Training of Convolutional Networks:.

参数
  • params (Iterable) – Iterable of parameters to optimize or dicts defining parameter groups.

  • lr (float) – Base learning rate.

  • momentum (float) – Momentum factor. Defaults to 0.

  • weight_decay (float) – Weight decay (L2 penalty). Defaults to 0.

  • dampening (float) – Dampening for momentum. Defaults to 0.

  • eta (float) – LARS coefficient. Defaults to 0.001.

  • nesterov (bool) – Enables Nesterov momentum. Defaults to False.

  • eps (float) – A small number to avoid dviding zero. Defaults to 1e-8.

示例

>>> optimizer = LARS(model.parameters(), lr=0.1, momentum=0.9,
>>>                  weight_decay=1e-4, eta=1e-3)
>>> optimizer.zero_grad()
>>> loss_fn(model(input), target).backward()
>>> optimizer.step()
step(closure=None)torch.Tensor[源代码]

Performs a single optimization step.

参数

closure (callable, optional) – A closure that reevaluates the model and returns the loss.

class mmselfsup.engine.optimizers.LearningRateDecayOptimWrapperConstructor(optim_wrapper_cfg: dict, paramwise_cfg: Optional[dict] = None)[源代码]

Different learning rates are set for different layers of backbone.

Note: Currently, this optimizer constructor is built for ViT and Swin.

In addition to applying layer-wise learning rate decay schedule, the paramwise_cfg only supports weight decay customization.

add_params(params: List[dict], module: torch.nn.modules.module.Module, optimizer_cfg: dict, **kwargs)None[源代码]

Add all parameters of module to the params list.

The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.

参数
  • params (List[dict]) – A list of param groups, it will be modified in place.

  • module (nn.Module) – The module to be added.

  • optimizer_cfg (dict) – The configuration of optimizer.

  • prefix (str) – The prefix of the module.

mmselfsup.evaluation

functional

mmselfsup.evaluation.functional.knn_eval(train_features: torch.Tensor, train_labels: torch.Tensor, test_features: torch.Tensor, test_labels: torch.Tensor, k: int, T: float, num_classes: int = 1000)Tuple[float, float][源代码]

Compute accuracy of knn classifier predictions.

参数
  • train_features (Tensor) – Extracted features in the training set.

  • train_labels (Tensor) – Labels in the training set.

  • test_features (Tensor) – Extracted features in the testing set.

  • test_labels (Tensor) – Labels in the testing set.

  • k (int) – Number of NN to use.

  • T (float) – Temperature used in the voting coefficient.

  • num_classes (int) – Number of classes. Defaults to 1000.

返回

The top1 and top5 accuracy.

返回类型

Tuple[float, float]

mmselfsup.models

algorithms

class mmselfsup.models.algorithms.BEiT(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, target_generator: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

BEiT v1/v2.

Implementation of BEiT: BERT Pre-Training of Image Transformers and BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers.

loss(batch_inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • batch_inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

class mmselfsup.models.algorithms.BYOL(backbone: dict, neck: dict, head: dict, base_momentum: float = 0.996, pretrained: Optional[str] = None, data_preprocessor: Optional[dict] = None, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

BYOL.

Implementation of Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning.

参数
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors.

  • head (dict) – Config dict for module of head functions.

  • base_momentum (float) – The base momentum coefficient for the target network. Defaults to 0.996.

  • pretrained (str, optional) – The pretrained checkpoint path, support local path and remote path. Defaults to None.

  • data_preprocessor (dict, optional) – The config for preprocessing input data. If None or no specified type, it will use “SelfSupDataPreprocessor” as type. See SelfSupDataPreprocessor for more details. Defaults to None.

  • init_cfg (Union[List[dict], dict], optional) – Config dict for weight initialization. Defaults to None.

extract_feat(inputs: List[torch.Tensor], **kwargs)Tuple[torch.Tensor][源代码]

Function to extract features from backbone.

参数

batch_inputs (List[torch.Tensor]) – The input images.

返回

Backbone outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

class mmselfsup.models.algorithms.BarlowTwins(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, target_generator: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

BarlowTwins.

Implementation of Barlow Twins: Self-Supervised Learning via Redundancy Reduction. Part of the code is borrowed from: https://github.com/facebookresearch/barlowtwins/blob/main/main.py.

extract_feat(inputs: List[torch.Tensor], **kwargs)Tuple[torch.Tensor][源代码]

Function to extract features from backbone.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

Backbone outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

class mmselfsup.models.algorithms.BaseModel(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, target_generator: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

BaseModel for SelfSup.

All algorithms should inherit this module.

参数
  • backbone (dict) – The backbone module. See mmcls.models.backbones.

  • neck (dict, optional) – The neck module to process features from backbone. See mmcls.models.necks. Defaults to None.

  • head (dict, optional) – The head module to do prediction and calculate loss from processed features. See mmcls.models.heads. Notice that if the head is not set, almost all methods cannot be used except extract_feat(). Defaults to None.

  • target_generator – (dict, optional): The target_generator module to generate targets for self-supervised learning optimization, such as HOG, extracted features from other modules(DALL-E, CLIP), etc.

  • pretrained (str, optional) – The pretrained checkpoint path, support local path and remote path. Defaults to None.

  • data_preprocessor (Union[dict, nn.Module], optional) – The config for preprocessing input data. If None or no specified type, it will use “SelfSupDataPreprocessor” as type. See SelfSupDataPreprocessor for more details. Defaults to None.

  • init_cfg (dict, optional) – the config to control the initialization. Defaults to None.

extract_feat(inputs: torch.Tensor)[源代码]

Extract features from the input tensor with shape (N, C, …).

This is a abstract method, and subclass should overwrite this methods if needed.

参数

inputs (Tensor) – A batch of inputs. The shape of it should be (num_samples, num_channels, *img_shape).

返回

The output of specified stage. The output depends on detailed implementation.

返回类型

tuple | Tensor

forward(inputs: torch.Tensor, data_samples: Optional[List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample]] = None, mode: str = 'tensor')[源代码]

Returns losses or predictions of training, validation, testing, and simple inference process.

This module overwrites the abstract method in BaseModel.

参数
  • inputs (torch.Tensor) – batch input tensor collated by data_preprocessor.

  • data_samples (List[BaseDataElement], optional) – data samples collated by data_preprocessor.

  • mode (str) –

    mode should be one of loss, predict and tensor.

    • loss: Called by train_step and return loss dict used for logging

    • predict: Called by val_step and test_step and return list of BaseDataElement results used for computing metric.

    • tensor: Called by custom use to get Tensor type results.

返回

  • If mode == loss, return a dict of loss tensor used for backward and logging.

  • If mode == predict, return a list of BaseDataElement for computing metric and getting inference result.

  • If mode == tensor, return a tensor or tuple of tensor or ``dict of tensor for custom use.

返回类型

ForwardResults (dict or list)

loss(inputs: torch.Tensor, data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample])dict[源代码]

Calculate losses from a batch of inputs and data samples.

This is a abstract method, and subclass should overwrite this methods if needed.

参数
  • inputs (torch.Tensor) – The input tensor with shape (N, C, …) in general.

  • data_samples (List[SelfSupDataSample]) – The annotation data of every samples.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

predict(inputs: tuple, data_samples: Optional[List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample]] = None, **kwargs)List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample][源代码]

Predict results from the extracted features.

This module returns the logits before loss, which are used to compute all kinds of metrics. This is a abstract method, and subclass should overwrite this methods if needed.

参数
  • feats (tuple) – The features extracted from the backbone.

  • data_samples (List[BaseDataElement], optional) – The annotation data of every samples. Defaults to None.

  • **kwargs – Other keyword arguments accepted by the predict method of head.

property with_head: bool

Check if the model has a head module.

property with_neck: bool

Check if the model has a neck module.

property with_target_generator: bool

Check if the model has a target_generator module.

class mmselfsup.models.algorithms.CAE(backbone: dict, neck: dict, head: dict, target_generator: Optional[dict] = None, base_momentum: float = 0.0, data_preprocessor: Optional[dict] = None, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

CAE.

Implementation of Context Autoencoder for Self-Supervised Representation Learning.

参数
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of neck.

  • head (dict) – Config dict for module of head functions.

  • target_generator – (dict, optional): The target_generator module to generate targets for self-supervised learning optimization, such as HOG, extracted features from other modules(DALL-E, CLIP), etc.

  • base_momentum (float) – The base momentum coefficient for the target network. Defaults to 0.0.

  • data_preprocessor (dict, optional) – The config for preprocessing input data. If None or no specified type, it will use “SelfSupDataPreprocessor” as type. See SelfSupDataPreprocessor for more details. Defaults to None.

  • init_cfg (Union[List[dict], dict], optional) – Config dict for weight initialization. Defaults to None.

init_weights()None[源代码]

Initialize weights.

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

momentum_update()None[源代码]

Momentum update of the teacher network.

class mmselfsup.models.algorithms.DeepCluster(backbone: dict, neck: dict, head: dict, pretrained: Optional[str] = None, data_preprocessor: Optional[dict] = None, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

DeepCluster.

Implementation of Deep Clustering for Unsupervised Learning of Visual Features. The clustering operation is in engine/hooks/deepcluster_hook.py.

参数
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors.

  • head (dict) – Config dict for module of head functions.

  • pretrained (str, optional) – The pretrained checkpoint path, support local path and remote path. Defaults to None.

  • data_preprocessor (dict, optional) – The config for preprocessing input data. If None or no specified type, it will use “SelfSupDataPreprocessor” as type. See SelfSupDataPreprocessor for more details. Defaults to None.

  • init_cfg (Union[List[dict], dict], optional) – Config dict for weight initialization. Defaults to None.

extract_feat(inputs: List[torch.Tensor], **kwarg)Tuple[torch.Tensor][源代码]

Function to extract features from backbone.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

Backbone outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

predict(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample][源代码]

The forward function in testing.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

The prediction from model.

返回类型

List[SelfSupDataSample]

class mmselfsup.models.algorithms.DenseCL(backbone: dict, neck: dict, head: dict, queue_len: int = 65536, feat_dim: int = 128, momentum: float = 0.999, loss_lambda: float = 0.5, pretrained: Optional[str] = None, data_preprocessor: Optional[dict] = None, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

DenseCL.

Implementation of Dense Contrastive Learning for Self-Supervised Visual Pre-Training. Borrowed from the authors’ code: https://github.com/WXinlong/DenseCL. The loss_lambda warmup is in engine/hooks/densecl_hook.py.

参数
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors.

  • head (dict) – Config dict for module of head functions.

  • queue_len (int) – Number of negative keys maintained in the queue. Defaults to 65536.

  • feat_dim (int) – Dimension of compact feature vectors. Defaults to 128.

  • momentum (float) – Momentum coefficient for the momentum-updated encoder. Defaults to 0.999.

  • loss_lambda (float) – Loss weight for the single and dense contrastive loss. Defaults to 0.5.

  • pretrained (str, optional) – The pretrained checkpoint path, support local path and remote path. Defaults to None.

  • data_preprocessor (dict, optional) – The config for preprocessing input data. If None or no specified type, it will use “SelfSupDataPreprocessor” as type. See SelfSupDataPreprocessor for more details. Defaults to None.

  • init_cfg (Union[List[dict], dict], optional) – Config dict for weight initialization. Defaults to None.

extract_feat(inputs: List[torch.Tensor], **kwargs)Tuple[torch.Tensor][源代码]

Function to extract features from backbone.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

Backbone outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

predict(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)mmselfsup.structures.selfsup_data_sample.SelfSupDataSample[源代码]

Predict results from the extracted features.

参数
  • batch_inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

The prediction from model.

返回类型

SelfSupDataSample

class mmselfsup.models.algorithms.EVA(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, target_generator: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

EVA.

Implementation of EVA: Exploring the Limits of Masked Visual Representation Learning at Scale.

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

class mmselfsup.models.algorithms.MAE(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, target_generator: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

MAE.

Implementation of Masked Autoencoders Are Scalable Vision Learners.

extract_feat(inputs: List[torch.Tensor], data_samples: Optional[List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample]] = None, **kwarg)Tuple[torch.Tensor][源代码]

The forward function to extract features from neck.

参数

inputs (List[torch.Tensor]) – The input images.

返回

Neck outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

reconstruct(features: torch.Tensor, data_samples: Optional[List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample]] = None, **kwargs)mmselfsup.structures.selfsup_data_sample.SelfSupDataSample[源代码]

The function is for image reconstruction.

参数
  • features (torch.Tensor) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

The prediction from model.

返回类型

SelfSupDataSample

class mmselfsup.models.algorithms.MILAN(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, target_generator: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

MILAN.

Implementation of MILAN: Masked Image Pretraining on Language Assisted Representation.

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

class mmselfsup.models.algorithms.MaskFeat(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, target_generator: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

MaskFeat.

Implementation of Masked Feature Prediction for Self-Supervised Visual Pre-Training.

extract_feat(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], compute_hog: bool = True, **kwarg)Tuple[torch.Tensor][源代码]

The forward function to extract features from neck.

参数
  • inputs (List[torch.Tensor]) – The input images and mask.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

  • compute_hog (bool) – Whether to compute hog during extraction. If True, the batch size of inputs need to be 1. Defaults to True.

返回

Neck outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

reconstruct(features: List[torch.Tensor], data_samples: Optional[List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample]] = None, **kwargs)mmselfsup.structures.selfsup_data_sample.SelfSupDataSample[源代码]

The function is for image reconstruction.

参数
  • features (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

The prediction from model.

返回类型

SelfSupDataSample

class mmselfsup.models.algorithms.MixMIM(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

MiXMIM.

Implementation of MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning..

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

class mmselfsup.models.algorithms.MoCo(backbone: dict, neck: dict, head: dict, queue_len: int = 65536, feat_dim: int = 128, momentum: float = 0.999, pretrained: Optional[str] = None, data_preprocessor: Optional[dict] = None, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

MoCo.

Implementation of Momentum Contrast for Unsupervised Visual Representation Learning. Part of the code is borrowed from: https://github.com/facebookresearch/moco/blob/master/moco/builder.py.

参数
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors.

  • head (dict) – Config dict for module of head functions.

  • queue_len (int) – Number of negative keys maintained in the queue. Defaults to 65536.

  • feat_dim (int) – Dimension of compact feature vectors. Defaults to 128.

  • momentum (float) – Momentum coefficient for the momentum-updated encoder. Defaults to 0.999.

  • pretrained (str, optional) – The pretrained checkpoint path, support local path and remote path. Defaults to None.

  • data_preprocessor (dict, optional) – The config for preprocessing input data. If None or no specified type, it will use “SelfSupDataPreprocessor” as type. See SelfSupDataPreprocessor for more details. Defaults to None.

  • init_cfg (Union[List[dict], dict], optional) – Config dict for weight initialization. Defaults to None.

extract_feat(inputs: List[torch.Tensor], **kwarg)Tuple[torch.Tensor][源代码]

Function to extract features from backbone.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

Backbone outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

class mmselfsup.models.algorithms.MoCoV3(backbone: dict, neck: dict, head: dict, base_momentum: float = 0.99, pretrained: Optional[str] = None, data_preprocessor: Optional[dict] = None, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

MoCo v3.

Implementation of An Empirical Study of Training Self-Supervised Vision Transformers.

参数
  • backbone (dict) – Config dict for module of backbone

  • neck (dict) – Config dict for module of deep features to compact feature vectors.

  • head (dict) – Config dict for module of head functions.

  • base_momentum (float) – Momentum coefficient for the momentum-updated encoder. Defaults to 0.99.

  • pretrained (str, optional) – The pretrained checkpoint path, support local path and remote path. Defaults to None.

  • data_preprocessor (dict, optional) – The config for preprocessing input data. If None or no specified type, it will use “SelfSupDataPreprocessor” as type. See SelfSupDataPreprocessor for more details. Defaults to None.

  • init_cfg (Union[List[dict], dict], optional) – Config dict for weight initialization. Defaults to None.

extract_feat(inputs: List[torch.Tensor], **kwarg)Tuple[torch.Tensor][源代码]

Function to extract features from backbone.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All

返回

Backbone outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

class mmselfsup.models.algorithms.NPID(backbone: dict, neck: dict, head: dict, memory_bank: dict, neg_num: int = 65536, ensure_neg: bool = False, pretrained: Optional[str] = None, data_preprocessor: Optional[dict] = None, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

NPID.

Implementation of Unsupervised Feature Learning via Non-parametric Instance Discrimination.

参数
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors.

  • head (dict) – Config dict for module of head functions.

  • memory_bank (dict) – Config dict for module of memory bank.

  • neg_num (int) – Number of negative samples for each image. Defaults to 65536.

  • ensure_neg (bool) – If False, there is a small probability that negative samples contain positive ones. Defaults to False.

  • pretrained (str, optional) – The pretrained checkpoint path, support local path and remote path. Defaults to None.

  • data_preprocessor (dict, optional) – The config for preprocessing input data. If None or no specified type, it will use “SelfSupDataPreprocessor” as type. See SelfSupDataPreprocessor for more details. Defaults to None.

  • init_cfg (Union[List[dict], dict], optional) – Config dict for weight initialization. Defaults to None.

extract_feat(inputs: List[torch.Tensor], **kwarg)Tuple[torch.Tensor][源代码]

Function to extract features from backbone.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

Backbone outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, Tensor]

class mmselfsup.models.algorithms.ODC(backbone: dict, neck: dict, head: dict, memory_bank: dict, pretrained: Optional[str] = None, data_preprocessor: Optional[dict] = None, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

ODC.

Official implementation of Online Deep Clustering for Unsupervised Representation Learning. The operation w.r.t. memory bank and loss re-weighting is in engine/hooks/odc_hook.py.

参数
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors.

  • head (dict) – Config dict for module of head functions.

  • memory_bank (dict) – Config dict for module of memory bank.

  • pretrained (str, optional) – The pretrained checkpoint path, support local path and remote path. Defaults to None.

  • data_preprocessor (dict, optional) – The config for preprocessing input data. If None or no specified type, it will use “SelfSupDataPreprocessor” as type. See SelfSupDataPreprocessor for more details. Defaults to None.

  • init_cfg (Union[List[dict], dict], optional) – Config dict for weight initialization. Defaults to None.

extract_feat(inputs: List[torch.Tensor], **kwarg)Tuple[torch.Tensor][源代码]

Function to extract features from backbone.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

Backbone outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

predict(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample][源代码]

The forward function in testing.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

The prediction from model.

返回类型

List[SelfSupDataSample]

class mmselfsup.models.algorithms.PixMIM(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, target_generator: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

The official implementation of PixMIM.

Implementation of PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling.

Please refer to MAE for these initialization arguments.

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

class mmselfsup.models.algorithms.RelativeLoc(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, target_generator: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

Relative patch location.

Implementation of Unsupervised Visual Representation Learning by Context Prediction.

extract_feat(inputs: List[torch.Tensor], **kwargs)Tuple[torch.Tensor][源代码]

Function to extract features from backbone.

参数

inputs (List[torch.Tensor]) – The input images.

返回

Backbone outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

predict(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample][源代码]

The forward function in testing.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

The prediction from model.

返回类型

List[SelfSupDataSample]

class mmselfsup.models.algorithms.RotationPred(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, target_generator: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

Rotation prediction.

Implementation of Unsupervised Representation Learning by Predicting Image Rotations.

extract_feat(inputs: List[torch.Tensor], **kwargs)Tuple[torch.Tensor][源代码]

Function to extract features from backbone.

参数

inputs (List[torch.Tensor]) – The input images.

返回

Backbone outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

predict(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample][源代码]

The forward function in testing.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

The prediction from model.

返回类型

List[SelfSupDataSample]

class mmselfsup.models.algorithms.SimCLR(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, target_generator: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

SimCLR.

Implementation of A Simple Framework for Contrastive Learning of Visual Representations.

extract_feat(inputs: List[torch.Tensor], **kwargs)Tuple[torch.Tensor][源代码]

Function to extract features from backbone.

参数

inputs (List[torch.Tensor]) – The input images.

返回

Backbone outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

class mmselfsup.models.algorithms.SimMIM(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, target_generator: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

SimMIM.

Implementation of SimMIM: A Simple Framework for Masked Image Modeling.

extract_feat(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwarg)torch.Tensor[源代码]

The forward function to extract features.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

The reconstructed images.

返回类型

torch.Tensor

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, Tensor]

reconstruct(features: torch.Tensor, data_samples: Optional[List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample]] = None, **kwargs)mmselfsup.structures.selfsup_data_sample.SelfSupDataSample[源代码]

The function is for image reconstruction.

参数
  • features (torch.Tensor) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

The prediction from model.

返回类型

SelfSupDataSample

class mmselfsup.models.algorithms.SimSiam(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, target_generator: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

SimSiam.

Implementation of Exploring Simple Siamese Representation Learning. The operation of fixing learning rate of predictor is in engine/hooks/simsiam_hook.py.

extract_feat(inputs: List[torch.Tensor], **kwarg)Tuple[torch.Tensor][源代码]

Function to extract features from backbone.

参数

inputs (List[torch.Tensor]) – The input images.

返回

Backbone outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

The forward function in training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, Tensor]

class mmselfsup.models.algorithms.SwAV(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, target_generator: Optional[dict] = None, pretrained: Optional[str] = None, data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None, init_cfg: Optional[dict] = None)[源代码]

SwAV.

Implementation of Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. The queue is built in engine/hooks/swav_hook.py.

extract_feat(inputs: List[torch.Tensor], **kwargs)Tuple[torch.Tensor][源代码]

Function to extract features from backbone.

参数

inputs (List[torch.Tensor]) – The input images.

返回

backbone outputs.

返回类型

Tuple[torch.Tensor]

loss(inputs: List[torch.Tensor], data_samples: List[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample], **kwargs)Dict[str, torch.Tensor][源代码]

Forward computation during training.

参数
  • inputs (List[torch.Tensor]) – The input images.

  • data_samples (List[SelfSupDataSample]) – All elements required during the forward function.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

backbones

class mmselfsup.models.backbones.BEiTViT(arch: str = 'base', img_size: int = 224, patch_size: int = 16, in_channels: int = 3, out_indices: int = - 1, drop_rate: float = 0, drop_path_rate: float = 0, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, final_norm: bool = True, avg_token: bool = False, frozen_stages: int = - 1, output_cls_token: bool = True, use_abs_pos_emb: bool = False, use_rel_pos_bias: bool = False, use_shared_rel_pos_bias: bool = True, layer_scale_init_value: int = 0.1, interpolate_mode: str = 'bicubic', patch_cfg: dict = {'padding': 0}, layer_cfgs: dict = {}, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

Vision Transformer for BEiT pre-training.

Rewritten version of: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

参数
  • arch (str | dict) –

    Vision Transformer architecture. If use string, choose from ‘small’, ‘base’ and ‘large’. If use dict, it should have below keys:

    • embed_dims (int): The dimensions of embedding.

    • num_layers (int): The number of transformer encoder layers.

    • num_heads (int): The number of heads in attention modules.

    • feedforward_channels (int): The hidden dimensions in feedforward modules.

    Defaults to ‘base’.

  • img_size (int | tuple) – The expected input image shape. Because we support dynamic input shape, just set the argument to the most common input image shape. Defaults to 224.

  • patch_size (int | tuple) – The patch size in patch embedding. Defaults to 16.

  • in_channels (int) – The num of input channels. Defaults to 3.

  • out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.

  • drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • qkv_bias (bool) – Whether to add bias for qkv in attention modules. Defaults to True.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.

  • with_cls_token (bool) – Whether concatenating class token into image tokens as transformer input. Defaults to True.

  • avg_token (bool) – Whether or not to use the mean patch token for classification. If True, the model will only take the average of all patch tokens. Defaults to False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.

  • use_abs_pos_emb (bool) – Whether or not use absolute position embedding. Defaults to False.

  • use_rel_pos_bias (bool) – Whether or not use relative position bias. Defaults to False.

  • use_shared_rel_pos_bias (bool) – Whether or not use shared relative position bias. Defaults to True.

  • layer_scale_init_value (float) – The initialization value for the learnable scaling of attention and FFN. Defaults to 0.1.

  • interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.

  • patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.

  • layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor, mask: torch.Tensor)Tuple[torch.Tensor][源代码]

The BEiT style forward function.

参数
  • x (torch.Tensor) – Input images, which is of shape (B x C x H x W).

  • mask (torch.Tensor) – Mask for input, which is of shape (B x patch_resolution[0] x patch_resolution[1]).

返回

Hidden features.

返回类型

Tuple[torch.Tensor]

init_weights()None[源代码]

Initialize position embedding, patch embedding and cls token.

rescale_init_weight()None[源代码]

Rescale the initialized weights.

class mmselfsup.models.backbones.CAEViT(arch: str = 'b', img_size: int = 224, patch_size: int = 16, out_indices: int = - 1, drop_rate: float = 0, drop_path_rate: float = 0, qkv_bias: bool = True, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, final_norm: bool = True, output_cls_token: bool = True, interpolate_mode: str = 'bicubic', init_values: Optional[float] = None, patch_cfg: dict = {}, layer_cfgs: dict = {}, init_cfg: Optional[dict] = None)[源代码]

Vision Transformer for CAE pre-training.

Rewritten version of: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

参数
  • arch (str | dict) – Vision Transformer architecture. Default: ‘b’

  • img_size (int | tuple) – Input image size

  • patch_size (int | tuple) – The patch size

  • out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.

  • drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.

  • output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.

  • interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.

  • init_values (float, optional) – The init value of gamma in TransformerEncoderLayer.

  • patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.

  • layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(img: torch.Tensor, mask: torch.Tensor)torch.Tensor[源代码]

Generate features for masked images.

This function generates mask images and get the hidden features for visible patches.

参数
  • x (torch.Tensor) – Input images, which is of shape B x C x H x W.

  • mask (torch.Tensor) – Mask for input, which is of shape B x L.

返回

hidden features.

返回类型

torch.Tensor

init_weights()None[源代码]

Initialize position embedding, patch embedding and cls token.

class mmselfsup.models.backbones.MAEViT(arch: Union[str, dict] = 'b', img_size: int = 224, patch_size: int = 16, out_indices: Union[Sequence, int] = - 1, drop_rate: float = 0, drop_path_rate: float = 0, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, final_norm: bool = True, output_cls_token: bool = True, interpolate_mode: str = 'bicubic', patch_cfg: dict = {}, layer_cfgs: dict = {}, mask_ratio: float = 0.75, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

Vision Transformer for MAE pre-training.

A PyTorch implement of: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. This module implements the patch masking in MAE and initialize the position embedding with sine-cosine position embedding.

参数
  • arch (str | dict) – Vision Transformer architecture Default: ‘b’

  • img_size (int | tuple) – Input image size

  • patch_size (int | tuple) – The patch size

  • out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.

  • drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.

  • output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.

  • interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.

  • patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.

  • layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.

  • mask_ratio (bool) – The ratio of total number of patches to be masked. Defaults to 0.75.

  • init_cfg (Union[List[dict], dict], optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][源代码]

Generate features for masked images.

This function generates mask and masks some patches randomly and get the hidden features for visible patches.

参数

x (torch.Tensor) – Input images, which is of shape B x C x H x W.

返回

Hidden features, mask and the ids to restore original image.

  • x (torch.Tensor): hidden features, which is of shape B x (L * mask_ratio) x C.

  • mask (torch.Tensor): mask used to mask image.

  • ids_restore (torch.Tensor): ids to restore original image.

返回类型

Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

init_weights()None[源代码]

Initialize position embedding, patch embedding and cls token.

random_masking(x: torch.Tensor, mask_ratio: float = 0.75)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][源代码]

Generate the mask for MAE Pre-training.

参数
  • x (torch.Tensor) – Image with data augmentation applied, which is of shape B x L x C.

  • mask_ratio (float) – The mask ratio of total patches. Defaults to 0.75.

返回

masked image, mask and the ids to restore original image.
  • x_masked (torch.Tensor): masked image.

  • mask (torch.Tensor): mask used to mask image.

  • ids_restore (torch.Tensor): ids to restore original image.

返回类型

Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

class mmselfsup.models.backbones.MILANViT(arch: Union[str, dict] = 'b', img_size: int = 224, patch_size: int = 16, out_indices: Union[Sequence, int] = - 1, drop_rate: float = 0, drop_path_rate: float = 0, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, final_norm: bool = True, output_cls_token: bool = True, interpolate_mode: str = 'bicubic', patch_cfg: dict = {}, layer_cfgs: dict = {}, mask_ratio: float = 0.75, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

MILANViT.

Implementation of the encoder for MILAN: Masked Image Pretraining on Language Assisted Representation. This module inherits from MAEViT and only overrides the forward function and replace random masking with attention masking.

attention_masking(x: torch.Tensor, mask_ratio: float, importance: torch.Tensor)Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][源代码]

Generate attention mask for MILAN.

This is what is different from MAEViT, which uses random masking. Attention masking generates attention mask for MILAN, according to importance. The higher the importance, the more likely the patch is kept.

参数
  • x (torch.Tensor) – Input images, which is of shape B x L x C.

  • mask_ratio (float) – The ratio of patches to be masked.

  • importance (torch.Tensor) – Importance of each patch, which is of shape B x L.

返回

masked image, mask, the ids to restore original image, ids of the shuffled patches, ids of the kept patches, ids of the removed patches.

返回类型

Tuple[torch.Tensor, …]

forward(x: torch.Tensor, importance: torch.Tensor)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][源代码]

Generate features for masked images.

This function generates mask and masks some patches randomly and get the hidden features for visible patches. The mask is generated by importance. The higher the importance, the more likely the patch is kept. The importance is calculated by CLIP. The higher the CLIP score, the more likely the patch is kept. The CLIP score is calculated by by cross attention between the class token and all other tokens from the last layer.

参数
  • x (torch.Tensor) – Input images, which is of shape B x C x H x W.

  • importance (torch.Tensor) – Importance of each patch, which is of shape B x L.

返回

masked image, the ids to restore original image, ids of the kept patches, ids of the removed patches.

  • x (torch.Tensor): hidden features, which is of shape B x (L * mask_ratio) x C.

  • ids_restore (torch.Tensor): ids to restore original image.

  • ids_keep (torch.Tensor): ids of the kept patches.

  • ids_dump (torch.Tensor): ids of the removed patches.

返回类型

Tuple[torch.Tensor, …]

class mmselfsup.models.backbones.MaskFeatViT(arch: Union[str, dict] = 'b', img_size: int = 224, patch_size: int = 16, out_indices: Union[Sequence, int] = - 1, drop_rate: float = 0, drop_path_rate: float = 0, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, final_norm: bool = True, output_cls_token: bool = True, interpolate_mode: str = 'bicubic', patch_cfg: dict = {}, layer_cfgs: dict = {}, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

Vision Transformer for MaskFeat pre-training.

A PyTorch implement of: Masked Feature Prediction for Self-Supervised Visual Pre-Training.

参数
  • arch (str | dict) – Vision Transformer architecture Default: ‘b’

  • img_size (int | tuple) – Input image size

  • patch_size (int | tuple) – The patch size

  • out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.

  • drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.

  • output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.

  • interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.

  • patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.

  • layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor, mask: torch.Tensor)torch.Tensor[源代码]

Generate features for masked images.

参数
  • x (torch.Tensor) – Input images.

  • mask (torch.Tensor) – Input masks.

返回

Features with cls_tokens.

返回类型

torch.Tensor

init_weights()None[源代码]

Initialize position embedding, mask token and cls token.

class mmselfsup.models.backbones.MixMIMTransformerPretrain(arch: Union[str, dict] = 'base', mlp_ratio: float = 4, img_size: int = 224, patch_size: int = 4, in_channels: int = 3, window_size: List = [14, 14, 14, 7], qkv_bias: bool = True, patch_cfg: dict = {}, norm_cfg: dict = {'type': 'LN'}, drop_rate: float = 0.0, drop_path_rate: float = 0.0, attn_drop_rate: float = 0.0, use_checkpoint: bool = False, range_mask_ratio: float = 0.0, init_cfg: Optional[dict] = None)[源代码]

MixMIM backbone during pretraining.

A PyTorch implement of : ` MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning <https://arxiv.org/abs/2205.13137>`_

参数
  • arch (str | dict) –

    MixMIM architecture. If use string, choose from ‘base’,’large’ and ‘huge’. If use dict, it should have below keys:

    • embed_dims (int): The dimensions of embedding.

    • depths (int): The number of transformer encoder layers.

    • num_heads (int): The number of heads in attention modules.

    Defaults to ‘base’.

  • mlp_ratio (int) – The mlp ratio in FFN. Defaults to 4.

  • img_size (int | tuple) – The expected input image shape. Because we support dynamic input shape, just set the argument to mlp_ratio the most common input image shape. Defaults to 224.

  • patch_size (int | tuple) – The patch size in patch embedding. Defaults to 16.

  • in_channels (int) – The num of input channels. Defaults to 3.

  • window_size (list) – The height and width of the window.

  • qkv_bias (bool) – Whether to add bias for qkv in attention modules. Defaults to True.

  • patch_cfg (dict) – Extra config dict for patch embedding. Defaults to an empty dict.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults to 0.

  • attn_drop_rate (float) – Attention drop rate. Defaults to 0.

  • use_checkpoint (bool) – Whether use the checkpoint to

  • GPU memory cost (reduce) –

  • range_mask_ratio (float) – The range of mask ratio. Defaults to 0.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor, mask_ratio=0.5)[源代码]

Generate features for masked images.

This function generates mask and masks some patches randomly and get the hidden features for visible patches.

参数

x (torch.Tensor) – Input images, which is of shape B x C x H x W.

返回

  • x (torch.Tensor): hidden features, which is of shape B x L x C.

  • mask_s4 (torch.Tensor): the mask tensor for the last layer.

返回类型

Tuple[torch.Tensor, torch.Tensor]

init_weights()[源代码]

Initialize position embedding, patch embedding.

random_masking(x: torch.Tensor, mask_ratio: float = 0.5)[源代码]

Generate the mask for MixMIM Pretraining.

参数
  • x (torch.Tensor) – Image with data augmentation applied, which is of shape B x L x C.

  • mask_ratio (float) – The mask ratio of total patches. Defaults to 0.5.

返回

  • mask_s1 (torch.Tensor): mask with stride of self.encoder_stride // 8.

  • mask_s2 (torch.Tensor): mask with stride of self.encoder_stride // 4.

  • mask_s3 (torch.Tensor): mask with stride of self.encoder_stride // 2.

  • mask (torch.Tensor): mask with stride of self.encoder_stride.

返回类型

Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]

class mmselfsup.models.backbones.MoCoV3ViT(stop_grad_conv1: bool = False, frozen_stages: int = - 1, norm_eval: bool = False, init_cfg: Optional[Union[dict, List[dict]]] = None, **kwargs)[源代码]

Vision Transformer.

A pytorch implement of: An Images is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Part of the code is modified from: https://github.com/facebookresearch/moco-v3/blob/main/vits.py.

参数
  • stop_grad_conv1 (bool) – whether to stop the gradient of convolution layer in PatchEmbed. Defaults to False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

init_weights()None[源代码]

Initialize position embedding, patch embedding, qkv layers and cls token.

train(mode: bool = True)None[源代码]

Set module status before forward computation.

参数

mode (bool) – Whether it is train_mode or test_mode

class mmselfsup.models.backbones.ResNeXt(depth: int, groups: int = 32, width_per_group: int = 4, **kwargs)[源代码]

ResNeXt backbone.

Please refer to the paper for details.

As the behavior of forward function in MMSelfSup is different from MMCls, we register our own ResNeXt, inheriting from mmselfsup.model.backbone.ResNet.

参数
  • depth (int) – Network depth, from {50, 101, 152}.

  • groups (int) – Groups of conv2 in Bottleneck. Defaults to 32.

  • width_per_group (int) – Width per group of conv2 in Bottleneck. Defaults to 4.

  • in_channels (int) – Number of input image channels. Defaults to 3.

  • stem_channels (int) – Output channels of the stem layer. Defaults to 64.

  • num_stages (int) – Stages of the network. Defaults to 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Defaults to (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Defaults to (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Defaults to (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Defaults to False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Defaults to False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Defaults to None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Defaults to False.

示例

>>> from mmselfsup.models import ResNeXt
>>> import torch
>>> self = ResNeXt(depth=50)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 8, 8)
(1, 512, 4, 4)
(1, 1024, 2, 2)
(1, 2048, 1, 1)
make_res_layer(**kwargs)torch.nn.modules.module.Module[源代码]

Redefine the function for ResNeXt related args.

class mmselfsup.models.backbones.ResNet(depth: int, in_channels: int = 3, stem_channels: int = 64, base_channels: int = 64, expansion: Optional[int] = None, num_stages: int = 4, strides: Tuple[int] = (1, 2, 2, 2), dilations: Tuple[int] = (1, 1, 1, 1), out_indices: Tuple[int] = (4), style: str = 'pytorch', deep_stem: bool = False, avg_down: bool = False, frozen_stages: int = - 1, conv_cfg: Optional[dict] = None, norm_cfg: Optional[dict] = {'requires_grad': True, 'type': 'BN'}, norm_eval: bool = False, with_cp: bool = False, zero_init_residual: bool = False, init_cfg: Optional[dict] = [{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}], drop_path_rate: float = 0.0, **kwargs)[源代码]

ResNet backbone.

Please refer to the paper for details.

参数
  • depth (int) – Network depth, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Defaults to 3.

  • stem_channels (int) – Output channels of the stem layer. Defaults to 64.

  • base_channels (int) – Middle channels of the first stage. Defaults to 64.

  • num_stages (int) – Stages of the network. Defaults to 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Defaults to (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Defaults to (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (4, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Defaults to False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Defaults to False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Defaults to None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Defaults to False.

  • of the path to be zeroed. Defaults to 0.1 (Probability) –

示例

>>> from mmselfsup.models import ResNet
>>> import torch
>>> self = ResNet(depth=18)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)
forward(x: torch.Tensor)Tuple[torch.Tensor][源代码]

Forward function.

As the behavior of forward function in MMSelfSup is different from MMCls, we rewrite the forward function. MMCls does not output the feature map from the ‘stem’ layer, which will be used for downstream evaluation.

class mmselfsup.models.backbones.ResNetSobel(**kwargs)[源代码]

ResNet with Sobel layer.

This variant is used in clustering-based methods like DeepCluster to avoid color shortcut.

forward(x: torch.Tensor)Tuple[torch.Tensor][源代码]

Forward function.

class mmselfsup.models.backbones.ResNetV1d(**kwargs)[源代码]

ResNetV1d variant described in Bag of Tricks.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmselfsup.models.backbones.SimMIMSwinTransformer(arch: Union[str, dict] = 'T', img_size: Union[Tuple[int, int], int] = 224, in_channels: int = 3, drop_rate: float = 0.0, drop_path_rate: float = 0.1, out_indices: tuple = (3), use_abs_pos_embed: bool = False, with_cp: bool = False, frozen_stages: bool = - 1, norm_eval: bool = False, norm_cfg: dict = {'type': 'LN'}, stage_cfgs: Union[Sequence, dict] = {}, patch_cfg: dict = {}, pad_small_map: bool = False, init_cfg: Optional[dict] = None)[源代码]

Swin Transformer for SimMIM.

参数
  • Args

  • arch (str | dict) – Swin Transformer architecture Defaults to ‘T’.

  • img_size (int | tuple) – The size of input image. Defaults to 224.

  • in_channels (int) – The num of input channels. Defaults to 3.

  • drop_rate (float) – Dropout rate after embedding. Defaults to 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults to 0.1.

  • out_indices (tuple) – Layers to be outputted. Defaults to (3, ).

  • use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults to False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • norm_cfg (dict) – Config dict for normalization layer at end of backone. Defaults to dict(type=’LN’)

  • stage_cfgs (Sequence | dict) – Extra config dict for each stage. Defaults to empty dict.

  • patch_cfg (dict) – Extra config dict for patch embedding. Defaults to empty dict.

  • pad_small_map (bool) – If True, pad the small feature map to the window size, which is common used in detection and segmentation. If False, avoid shifting window and shrink the window size to the size of feature map, which is common used in classification. Defaults to False.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x: torch.Tensor, mask: torch.Tensor)Sequence[torch.Tensor][源代码]

Generate features for masked images.

This function generates mask images and get the hidden features for them.

参数
  • x (torch.Tensor) – Input images.

  • mask (torch.Tensor) – Masks used to construct masked images.

返回

A tuple containing features from multi-stages.

返回类型

tuple

init_weights()None[源代码]

Initialize weights.

necks

class mmselfsup.models.necks.AvgPool2dNeck(output_size: int = 1)[源代码]

The average pooling 2d neck.

forward(x: List[torch.Tensor])List[torch.Tensor][源代码]

Forward function.

class mmselfsup.models.necks.BEiTV2Neck(num_layers: int = 2, early_layers: int = 9, backbone_arch: str = 'base', drop_rate: float = 0.0, drop_path_rate: float = 0.0, layer_scale_init_value: float = 0.1, use_rel_pos_bias: bool = False, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, init_cfg: Optional[Union[dict, List[dict]]] = {'bias': 0, 'layer': 'Linear', 'std': 0.02, 'type': 'TruncNormal'})[源代码]

Neck for BEiTV2 Pre-training.

This module construct the decoder for the final prediction.

参数
  • num_layers (int) – Number of encoder layers of neck. Defaults to 2.

  • early_layers (int) – The layer index of the early output from the backbone. Defaults to 9.

  • backbone_arch (str) – Vision Transformer architecture. Defaults to base.

  • drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • layer_scale_init_value (float) – The initialization value for the learnable scaling of attention and FFN. Defaults to 0.1.

  • use_rel_pos_bias (bool) – Whether to use unique relative position bias, if False, use shared relative position bias defined in backbone.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(inputs: Tuple[torch.Tensor], rel_pos_bias: torch.Tensor, **kwargs)Tuple[torch.Tensor, torch.Tensor][源代码]

Get the latent prediction and final prediction.

参数
  • x (Tuple[torch.Tensor]) – Features of tokens.

  • rel_pos_bias (torch.Tensor) – Shared relative position bias table.

返回

  • x: The final layer features from backbone, which are normed in BEiTV2Neck.

  • x_cls_pt: The early state features from backbone, which are consist of final layer cls_token and early state patch_tokens from backbone and sent to PatchAggregation layers in the neck.

返回类型

Tuple[torch.Tensor, torch.Tensor]

rescale_patch_aggregation_init_weight()[源代码]

Rescale the initialized weights.

class mmselfsup.models.necks.CAENeck(patch_size: int = 16, num_classes: int = 8192, embed_dims: int = 768, regressor_depth: int = 6, decoder_depth: int = 8, num_heads: int = 12, mlp_ratio: int = 4, qkv_bias: bool = True, qk_scale: Optional[float] = None, drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, init_values: Optional[float] = None, mask_tokens_num: int = 75, init_cfg: Optional[dict] = None)[源代码]

Neck for CAE Pre-training.

This module construct the latent prediction regressor and the decoder for the latent prediction and final prediction.

参数
  • patch_size (int) – The patch size of each token. Defaults to 16.

  • num_classes (int) – The number of classes for final prediction. Defaults to 8192.

  • embed_dims (int) – The embed dims of latent feature in regressor and decoder. Defaults to 768.

  • regressor_depth (int) – The number of regressor blocks. Defaults to 6.

  • decoder_depth (int) – The number of decoder blocks. Defaults to 8.

  • num_heads (int) – The number of head in multi-head attention. Defaults to 12.

  • mlp_ratio (int) – The expand ratio of latent features in MLP. defaults to 4.

  • qkv_bias (bool) – Whether or not to use qkv bias. Defaults to True.

  • qk_scale (float, optional) – The scale applied to the results of qk. Defaults to None.

  • drop_rate (float) – The dropout rate. Defaults to 0.

  • attn_drop_rate (float) – The dropout rate in attention block. Defaults to 0.

  • norm_cfg (dict) – The config of normalization layer. Defaults to dict(type=’LN’, eps=1e-6).

  • init_values (float, optional) – The init value of gamma. Defaults to None.

  • mask_tokens_num (int) – The number of mask tokens. Defaults to 75.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x_unmasked: torch.Tensor, pos_embed_masked: torch.Tensor, pos_embed_unmasked: torch.Tensor)Tuple[torch.Tensor, torch.Tensor][源代码]

Get the latent prediction and final prediction.

参数
  • x_unmasked (torch.Tensor) – Features of unmasked tokens.

  • pos_embed_masked (torch.Tensor) – Position embedding of masked tokens.

  • pos_embed_unmasked (torch.Tensor) – Position embedding of unmasked tokens.

返回

Final prediction and latent

prediction.

返回类型

Tuple[torch.Tensor, torch.Tensor]

init_weights()None[源代码]

Initialization.

class mmselfsup.models.necks.ClsBatchNormNeck(input_features: int, affine: bool = False, eps: float = 1e-06, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

Normalize cls token across batch before head.

This module is proposed by MAE, when running linear probing.

参数
  • input_features (int) – The dimension of features.

  • affine (bool) – a boolean value that when set to True, this module has learnable affine parameters. Defaults to False.

  • eps (float) – a value added to the denominator for numerical stability. Defaults to 1e-6.

  • init_cfg (Dict or List[Dict], optional) – Config dict for weight initialization. Defaults to None.

forward(inputs: Tuple[List[torch.Tensor]])Tuple[List[torch.Tensor]][源代码]

The forward function.

class mmselfsup.models.necks.DenseCLNeck(in_channels: int, hid_channels: int, out_channels: int, num_grid: Optional[int] = None, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

The non-linear neck of DenseCL.

Single and dense neck in parallel: fc-relu-fc, conv-relu-conv. Borrowed from the authors’ code.

参数
  • in_channels (int) – Number of input channels.

  • hid_channels (int) – Number of hidden channels.

  • out_channels (int) – Number of output channels.

  • num_grid (int) – The grid size of dense features. Defaults to None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: List[torch.Tensor])List[torch.Tensor][源代码]

Forward function of neck.

参数

x (List[torch.Tensor]) – feature map of backbone.

返回

The global feature

vectors and dense feature vectors. - avgpooled_x: Global feature vectors. - x: Dense feature vectors. - avgpooled_x2: Dense feature vectors for queue.

返回类型

List[torch.Tensor, torch.Tensor, torch.Tensor]

class mmselfsup.models.necks.LinearNeck(in_channels: int, out_channels: int, with_avg_pool: bool = True, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

The linear neck: fc only.

参数
  • in_channels (int) – Number of input channels.

  • out_channels (int) – Number of output channels.

  • with_avg_pool (bool) – Whether to apply the global average pooling after backbone. Defaults to True.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: Tuple[torch.Tensor])List[torch.Tensor][源代码]

Forward function.

参数

x (List[torch.Tensor]) – The feature map of backbone.

返回

The output features.

返回类型

List[torch.Tensor]

class mmselfsup.models.necks.MAEPretrainDecoder(num_patches: int = 196, patch_size: int = 16, in_chans: int = 3, embed_dim: int = 1024, decoder_embed_dim: int = 512, decoder_depth: int = 8, decoder_num_heads: int = 16, mlp_ratio: int = 4, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, predict_feature_dim: Optional[float] = None, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

Decoder for MAE Pre-training.

Some of the code is borrowed from https://github.com/facebookresearch/mae. # noqa

参数
  • num_patches (int) – The number of total patches. Defaults to 196.

  • patch_size (int) – Image patch size. Defaults to 16.

  • in_chans (int) – The channel of input image. Defaults to 3.

  • embed_dim (int) – Encoder’s embedding dimension. Defaults to 1024.

  • decoder_embed_dim (int) – Decoder’s embedding dimension. Defaults to 512.

  • decoder_depth (int) – The depth of decoder. Defaults to 8.

  • decoder_num_heads (int) – Number of attention heads of decoder. Defaults to 16.

  • mlp_ratio (int) – Ratio of mlp hidden dim to decoder’s embedding dim. Defaults to 4.

  • norm_cfg (dict) – Normalization layer. Defaults to LayerNorm.

  • init_cfg (Union[List[dict], dict], optional) – Initialization config dict. Defaults to None.

示例

>>> from mmselfsup.models import MAEPretrainDecoder
>>> import torch
>>> self = MAEPretrainDecoder()
>>> self.eval()
>>> inputs = torch.rand(1, 50, 1024)
>>> ids_restore = torch.arange(0, 196).unsqueeze(0)
>>> level_outputs = self.forward(inputs, ids_restore)
>>> print(tuple(level_outputs.shape))
(1, 196, 768)
property decoder_norm

The normalization layer of decoder.

forward(x: torch.Tensor, ids_restore: torch.Tensor)torch.Tensor[源代码]

The forward function.

The process computes the visible patches’ features vectors and the mask tokens to output feature vectors, which will be used for reconstruction.

参数
  • x (torch.Tensor) – hidden features, which is of shape B x (L * mask_ratio) x C.

  • ids_restore (torch.Tensor) – ids to restore original image.

返回

The reconstructed feature vectors, which is of

shape B x (num_patches) x C.

返回类型

x (torch.Tensor)

init_weights()None[源代码]

Initialize position embedding and mask token of MAE decoder.

class mmselfsup.models.necks.MILANPretrainDecoder(num_patches: int = 196, patch_size: int = 16, in_chans: int = 3, embed_dim: int = 1024, decoder_embed_dim: int = 512, decoder_depth: int = 8, decoder_num_heads: int = 16, predict_feature_dim: int = 512, mlp_ratio: int = 4, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

Prompt decoder for MILAN.

This decoder is used in MILAN pretraining, which will not update these visible tokens from the encoder.

参数
  • num_patches (int) – The number of total patches. Defaults to 196.

  • patch_size (int) – Image patch size. Defaults to 16.

  • in_chans (int) – The channel of input image. Defaults to 3.

  • embed_dim (int) – Encoder’s embedding dimension. Defaults to 1024.

  • decoder_embed_dim (int) – Decoder’s embedding dimension. Defaults to 512.

  • decoder_depth (int) – The depth of decoder. Defaults to 8.

  • decoder_num_heads (int) – Number of attention heads of decoder. Defaults to 16.

  • predict_feature_dim (int) – The dimension of the feature to be predicted. Defaults to 512.

  • mlp_ratio (int) – Ratio of mlp hidden dim to decoder’s embedding dim. Defaults to 4.

  • norm_cfg (dict) – Normalization layer. Defaults to LayerNorm.

  • init_cfg (Union[List[dict], dict], optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor, ids_restore: torch.Tensor, ids_keep: torch.Tensor, ids_dump: torch.Tensor)torch.Tensor[源代码]

Forward function.

参数
  • x (torch.Tensor) – The input features, which is of shape (N, L, C).

  • ids_restore (torch.Tensor) – The indices to restore these tokens to the original image.

  • ids_keep (torch.Tensor) – The indices of tokens to be kept.

  • ids_dump (torch.Tensor) – The indices of tokens to be masked.

返回

The reconstructed features, which is of shape

(N, L, C).

返回类型

torch.Tensor

class mmselfsup.models.necks.MixMIMPretrainDecoder(num_patches: int = 196, patch_size: int = 16, in_chans: int = 3, embed_dim: int = 1024, encoder_stride: int = 32, decoder_embed_dim: int = 512, decoder_depth: int = 8, decoder_num_heads: int = 16, mlp_ratio: int = 4, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

Decoder for MixMIM Pretraining.

Some of the code is borrowed from https://github.com/Sense-X/MixMIM. # noqa

参数
  • num_patches (int) – The number of total patches. Defaults to 196.

  • patch_size (int) – Image patch size. Defaults to 16.

  • in_chans (int) – The channel of input image. Defaults to 3.

  • embed_dim (int) – Encoder’s embedding dimension. Defaults to 1024.

  • encoder_stride (int) – The output stride of MixMIM backbone. Defaults to 32.

  • decoder_embed_dim (int) – Decoder’s embedding dimension. Defaults to 512.

  • decoder_depth (int) – The depth of decoder. Defaults to 8.

  • decoder_num_heads (int) – Number of attention heads of decoder. Defaults to 16.

  • mlp_ratio (int) – Ratio of mlp hidden dim to decoder’s embedding dim. Defaults to 4.

  • norm_cfg (dict) – Normalization layer. Defaults to LayerNorm.

  • init_cfg (Union[List[dict], dict], optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor, mask: torch.Tensor)torch.Tensor[源代码]

Forward function.

参数
  • x (torch.Tensor) – The input features, which is of shape (N, L, C).

  • mask (torch.Tensor) – The tensor to indicate which tokens a re masked.

返回

The reconstructed features, which is of shape

(N, L, C).

返回类型

torch.Tensor

init_weights()None[源代码]

Initialize position embedding and mask token of MixMIM decoder.

class mmselfsup.models.necks.MoCoV2Neck(in_channels: int, hid_channels: int, out_channels: int, with_avg_pool: bool = True, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

The non-linear neck of MoCo v2: fc-relu-fc.

参数
  • in_channels (int) – Number of input channels.

  • hid_channels (int) – Number of hidden channels.

  • out_channels (int) – Number of output channels.

  • with_avg_pool (bool) – Whether to apply the global average pooling after backbone. Defaults to True.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: List[torch.Tensor])List[torch.Tensor][源代码]

Forward function.

参数

x (List[torch.Tensor]) – The feature map of backbone.

返回

The output features.

返回类型

List[torch.Tensor]

class mmselfsup.models.necks.NonLinearNeck(in_channels: int, hid_channels: int, out_channels: int, num_layers: int = 2, with_bias: bool = False, with_last_bn: bool = True, with_last_bn_affine: bool = True, with_last_bias: bool = False, with_avg_pool: bool = True, vit_backbone: bool = False, norm_cfg: dict = {'type': 'SyncBN'}, init_cfg: Optional[Union[dict, List[dict]]] = [{'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

The non-linear neck.

Structure: fc-bn-[relu-fc-bn] where the substructure in [] can be repeated. For the default setting, the repeated time is 1. The neck can be used in many algorithms, e.g., SimCLR, BYOL, SimSiam.

参数
  • in_channels (int) – Number of input channels.

  • hid_channels (int) – Number of hidden channels.

  • out_channels (int) – Number of output channels.

  • num_layers (int) – Number of fc layers. Defaults to 2.

  • with_bias (bool) – Whether to use bias in fc layers (except for the last). Defaults to False.

  • with_last_bn (bool) – Whether to add the last BN layer. Defaults to True.

  • with_last_bn_affine (bool) – Whether to have learnable affine parameters in the last BN layer (set False for SimSiam). Defaults to True.

  • with_last_bias (bool) – Whether to use bias in the last fc layer. Defaults to False.

  • with_avg_pool (bool) – Whether to apply the global average pooling after backbone. Defaults to True.

  • vit_backbone (bool) – The key to indicate whether the upstream backbone is ViT. Defaults to False.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’SyncBN’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x: Tuple[torch.Tensor])List[torch.Tensor][源代码]

Forward function.

参数

x (List[torch.Tensor]) – The feature map of backbone.

返回

The output features.

返回类型

List[torch.Tensor]

class mmselfsup.models.necks.ODCNeck(in_channels: int, hid_channels: int, out_channels: int, with_avg_pool: bool = True, norm_cfg: dict = {'type': 'SyncBN'}, init_cfg: Optional[Union[dict, List[dict]]] = [{'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

The non-linear neck of ODC: fc-bn-relu-dropout-fc-relu.

参数
  • in_channels (int) – Number of input channels.

  • hid_channels (int) – Number of hidden channels.

  • out_channels (int) – Number of output channels.

  • with_avg_pool (bool) – Whether to apply the global average pooling after backbone. Defaults to True.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’SyncBN’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x: List[torch.Tensor])List[torch.Tensor][源代码]

Forward function.

参数

x (List[torch.Tensor]) – The feature map of backbone.

返回

The output features.

返回类型

List[torch.Tensor]

class mmselfsup.models.necks.RelativeLocNeck(in_channels: int, out_channels: int, with_avg_pool: bool = True, norm_cfg: dict = {'type': 'BN1d'}, init_cfg: Optional[Union[dict, List[dict]]] = [{'type': 'Normal', 'std': 0.01, 'layer': 'Linear'}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

The neck of relative patch location: fc-bn-relu-dropout.

参数
  • in_channels (int) – Number of input channels.

  • out_channels (int) – Number of output channels.

  • with_avg_pool (bool) – Whether to apply the global average pooling after backbone. Defaults to True.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN1d’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x: List[torch.Tensor])List[torch.Tensor][源代码]

Forward function.

参数

x (List[torch.Tensor]) – The feature map of backbone.

返回

The output features.

返回类型

List[torch.Tensor]

class mmselfsup.models.necks.SimMIMNeck(in_channels: int, encoder_stride: int)[源代码]

Pre-train Neck For SimMIM.

This neck reconstructs the original image from the shrunk feature map.

参数
  • in_channels (int) – Channel dimension of the feature map.

  • encoder_stride (int) – The total stride of the encoder.

forward(x: torch.Tensor)torch.Tensor[源代码]

Forward function.

class mmselfsup.models.necks.SwAVNeck(in_channels: int, hid_channels: int, out_channels: int, with_avg_pool: bool = True, with_l2norm: bool = True, norm_cfg: dict = {'type': 'SyncBN'}, init_cfg: Optional[Union[dict, List[dict]]] = [{'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

The non-linear neck of SwAV: fc-bn-relu-fc-normalization.

参数
  • in_channels (int) – Number of input channels.

  • hid_channels (int) – Number of hidden channels.

  • out_channels (int) – Number of output channels.

  • with_avg_pool (bool) – Whether to apply the global average pooling after backbone. Defaults to True.

  • with_l2norm (bool) – whether to normalize the output after projection. Defaults to True.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’SyncBN’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x: List[torch.Tensor])List[torch.Tensor][源代码]

Forward function.

参数

x (List[torch.Tensor]) – list of feature maps, len(x) according to len(num_crops).

返回

The projection vectors.

返回类型

List[torch.Tensor]

forward_projection(x: torch.Tensor)torch.Tensor[源代码]

Compute projection.

参数

x (torch.Tensor) – The feature vectors after pooling.

返回

The output features with projection or L2-norm.

返回类型

torch.Tensor

heads

class mmselfsup.models.heads.BEiTV1Head(embed_dims: int, num_embed: int, loss: dict, init_cfg: Optional[Union[dict, List[dict]]] = {'bias': 0, 'layer': 'Linear', 'std': 0.02, 'type': 'TruncNormal'})[源代码]

Pretrain Head for BEiT v1.

Compute the logits and the cross entropy loss.

参数
  • embed_dims (int) – The dimension of embedding.

  • num_embed (int) – The number of classification types.

  • loss (dict) – The config of loss.

  • init_cfg (dict or List[dict], optional) – Initialization config dict. Defaults to None.

forward(feats: torch.Tensor, target: torch.Tensor, mask: torch.Tensor)torch.Tensor[源代码]

Generate loss.

参数
  • feats (torch.Tensor) – Features from backbone.

  • target (torch.Tensor) – Target generated by target_generator.

  • mask (torch.Tensor) – Generated mask for pretraing.

class mmselfsup.models.heads.BEiTV2Head(embed_dims: int, num_embed: int, loss: dict, init_cfg: Optional[Union[dict, List[dict]]] = {'bias': 0, 'layer': 'Linear', 'std': 0.02, 'type': 'TruncNormal'})[源代码]

Pretrain Head for BEiT.

Compute the logits and the cross entropy loss.

参数
  • embed_dims (int) – The dimension of embedding.

  • num_embed (int) – The number of classification types.

  • loss (dict) – The config of loss.

  • init_cfg (dict or List[dict], optional) – Initialization config dict. Defaults to None.

forward(feats: torch.Tensor, feats_cls_pt: torch.Tensor, target: torch.Tensor, mask: torch.Tensor)torch.Tensor[源代码]

Generate loss.

参数
  • feats (torch.Tensor) – Features from backbone.

  • feats_cls_pt (torch.Tensor) – Features from class late layers for pretraining.

  • target (torch.Tensor) – Target generated by target_generator.

  • mask (torch.Tensor) – Generated mask for pretraing.

class mmselfsup.models.heads.CAEHead(loss: dict, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

Pretrain Head for CAE.

Compute the align loss and the main loss. In addition, this head also generates the prediction target generated by dalle.

参数
  • loss (dict) – The config of loss.

  • tokenizer_path (str) – The path of the tokenizer.

  • init_cfg (dict or List[dict], optional) – Initialization config dict. Defaults to None.

forward(logits: torch.Tensor, logits_target: torch.Tensor, latent_pred: torch.Tensor, latent_target: torch.Tensor, mask: torch.Tensor)Tuple[torch.Tensor, torch.Tensor][源代码]

Generate loss.

参数
  • logits (torch.Tensor) – Logits generated by decoder.

  • logits_target (img_target) – Target generated by dalle for decoder prediction.

  • latent_pred (torch.Tensor) – Latent prediction by regressor.

  • latent_target (torch.Tensor) – Target for latent prediction, generated by teacher.

返回

The tuple of loss.
  • loss_main (torch.Tensor): Cross entropy loss.

  • loss_align (torch.Tensor): MSE loss.

返回类型

Tuple[torch.Tensor, torch.Tensor]

class mmselfsup.models.heads.ClsHead(loss: dict, with_avg_pool: bool = False, in_channels: int = 2048, num_classes: int = 1000, vit_backbone: bool = False, init_cfg: Optional[Union[dict, List[dict]]] = [{'type': 'Normal', 'std': 0.01, 'layer': 'Linear'}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

Simplest classifier head, with only one fc layer.

参数
  • loss (dict) – Config of the loss.

  • with_avg_pool (bool) – Whether to apply the average pooling after neck. Defaults to False.

  • in_channels (int) – Number of input channels. Defaults to 2048.

  • num_classes (int) – Number of classes. Defaults to 1000.

  • init_cfg (Dict or List[Dict], optional) – Initialization config dict.

forward(x: Union[List[torch.Tensor], Tuple[torch.Tensor]], label: torch.Tensor)torch.Tensor[源代码]

Get the loss.

参数
  • x (List[Tensor] | Tuple[Tensor]) – Feature maps of backbone, each tensor has shape (N, C, H, W).

  • label (torch.Tensor) – The label for cross entropy loss.

返回

The cross entropy loss.

返回类型

torch.Tensor

logits(x: Union[List[torch.Tensor], Tuple[torch.Tensor]])List[torch.Tensor][源代码]

Get the logits before the cross_entropy loss.

This module is used to obtain the logits before the loss.

参数

x (List[Tensor] | Tuple[Tensor]) – Feature maps of backbone, each tensor has shape (N, C, H, W).

返回

A list of class scores.

返回类型

List[Tensor]

class mmselfsup.models.heads.ContrastiveHead(loss: dict, temperature: float = 0.1)[源代码]

Head for contrastive learning.

The contrastive loss is implemented in this head and is used in SimCLR, MoCo, DenseCL, etc.

参数
  • loss (dict) – Config dict for module of loss functions.

  • temperature (float) – The temperature hyper-parameter that controls the concentration level of the distribution. Defaults to 0.1.

forward(pos: torch.Tensor, neg: torch.Tensor)torch.Tensor[源代码]

Forward function to compute contrastive loss.

参数
  • pos (torch.Tensor) – Nx1 positive similarity.

  • neg (torch.Tensor) – Nxk negative similarity.

返回

The contrastive loss.

返回类型

torch.Tensor

class mmselfsup.models.heads.LatentCrossCorrelationHead(in_channels: int, loss: dict)[源代码]

Head for latent feature cross correlation.

Part of the code is borrowed from script.

参数
  • in_channels (int) – Number of input channels.

  • loss (dict) – Config dict for module of loss functions.

forward(input: torch.Tensor, target: torch.Tensor)torch.Tensor[源代码]

Forward head.

参数
  • input (torch.Tensor) – NxC input features.

  • target (torch.Tensor) – NxC target features.

返回

The cross correlation loss.

返回类型

torch.Tensor

class mmselfsup.models.heads.LatentPredictHead(loss: dict, predictor: dict)[源代码]

Head for latent feature prediction.

This head builds a predictor, which can be any registered neck component. For example, BYOL and SimSiam call this head and build NonLinearNeck. It also implements similarity loss between two forward features.

参数
  • loss (dict) – Config dict for the loss.

  • predictor (dict) – Config dict for the predictor.

forward(input: torch.Tensor, target: torch.Tensor)Tuple[torch.Tensor, torch.Tensor][源代码]

Forward head.

参数
  • input (torch.Tensor) – NxC input features.

  • target (torch.Tensor) – NxC target features.

返回

The latent predict loss.

返回类型

torch.Tensor

class mmselfsup.models.heads.MAEPretrainHead(loss: dict, norm_pix: bool = False, patch_size: int = 16)[源代码]

Pre-training head for MAE.

参数
  • loss (dict) – Config of loss.

  • norm_pix_loss (bool) – Whether or not normalize target. Defaults to False.

  • patch_size (int) – Patch size. Defaults to 16.

construct_target(target: torch.Tensor)torch.Tensor[源代码]

Construct the reconstruction target.

In addition to splitting images into tokens, this module will also normalize the image according to norm_pix.

参数

target (torch.Tensor) – Image with the shape of B x 3 x H x W

返回

Tokenized images with the shape of B x L x C

返回类型

torch.Tensor

forward(pred: torch.Tensor, target: torch.Tensor, mask: torch.Tensor)torch.Tensor[源代码]

Forward function of MAE head.

参数
  • pred (torch.Tensor) – The reconstructed image.

  • target (torch.Tensor) – The target image.

  • mask (torch.Tensor) – The mask of the target image.

返回

The reconstruction loss.

返回类型

torch.Tensor

patchify(imgs: torch.Tensor)torch.Tensor[源代码]

Split images into non-overlapped patches.

参数

imgs (torch.Tensor) – A batch of images, of shape B x H x W x C.

返回

Patchified images. The shape is B x L x D.

返回类型

torch.Tensor

unpatchify(x: torch.Tensor)torch.Tensor[源代码]

Combine non-overlapped patches into images.

参数

x (torch.Tensor) – The shape is (N, L, patch_size**2 *3)

返回

The shape is (N, 3, H, W)

返回类型

imgs (torch.Tensor)

class mmselfsup.models.heads.MILANPretrainHead(loss: dict)[源代码]

MILAN pretrain head.

参数

loss (dict) – Config of loss.

forward(pred: torch.Tensor, target: torch.Tensor, mask: Optional[torch.Tensor] = None)torch.Tensor[源代码]

Forward function.

参数
  • pred (torch.Tensor) – Predicted features, of shape (N, L, D).

  • target (torch.Tensor) – Target features, of shape (N, L, D).

  • mask (torch.Tensor) – The mask of the target image of shape.

返回

the reconstructed loss.

返回类型

torch.Tensor

class mmselfsup.models.heads.MaskFeatPretrainHead(loss: dict)[源代码]

Pre-training head for MaskFeat.

It computes reconstruction loss between prediction and target in masked region.

参数

loss (dict) – Config dict for module of loss functions.

forward(pred: torch.Tensor, target: torch.Tensor, mask: torch.Tensor)torch.Tensor[源代码]

Forward head.

参数
  • latent (torch.Tensor) – Predictions, which is of shape B x (1 + L) x C.

  • target (torch.Tensor) – Hog features, which is of shape B x L x C.

  • mask (torch.Tensor) – The mask of the hog features, which is of shape B x H x W.

返回

The loss tensor.

返回类型

torch.Tensor

class mmselfsup.models.heads.MixMIMPretrainHead(loss: dict, norm_pix: bool = False, patch_size: int = 16)[源代码]

MixMIM pretrain head.

参数
  • loss (dict) – Config of loss.

  • norm_pix_loss (bool) – Whether or not normalize target. Defaults to False.

  • patch_size (int) – Patch size. Defaults to 16.

forward(x_rec: torch.Tensor, target: torch.Tensor, mask: torch.Tensor)torch.Tensor[源代码]

Forward function of MixMIM head.

参数
  • pred (torch.Tensor) – The reconstructed image.

  • target (torch.Tensor) – The target image.

  • mask (torch.Tensor) – The mask of the target image.

返回

The reconstruction loss.

返回类型

torch.Tensor

class mmselfsup.models.heads.MoCoV3Head(predictor: dict, loss: dict, temperature: float = 1.0)[源代码]

Head for MoCo v3 algorithms.

This head builds a predictor, which can be any registered neck component. It also implements latent contrastive loss between two forward features. Part of the code is modified from: https://github.com/facebookresearch/moco-v3/blob/main/moco/builder.py.

参数
  • predictor (dict) – Config dict for module of predictor.

  • loss (dict) – Config dict for module of loss functions.

  • temperature (float) – The temperature hyper-parameter that controls the concentration level of the distribution. Defaults to 1.0.

forward(base_out: torch.Tensor, momentum_out: torch.Tensor)torch.Tensor[源代码]

Forward head.

参数
  • base_out (torch.Tensor) – NxC features from base_encoder.

  • momentum_out (torch.Tensor) – NxC features from momentum_encoder.

返回

The loss tensor.

返回类型

torch.Tensor

class mmselfsup.models.heads.MultiClsHead(backbone: str = 'resnet50', in_indices: Sequence[int] = (0, 1, 2, 3, 4), pool_type: str = 'adaptive', num_classes: int = 1000, loss: dict = {'loss_weight': 1.0, 'type': 'mmcls.CrossEntropyLoss'}, with_last_layer_unpool: bool = False, cal_acc: bool = False, topk: Union[int, Tuple[int]] = (1), norm_cfg: dict = {'type': 'BN'}, init_cfg: Union[dict, List[dict]] = [{'type': 'Normal', 'std': 0.01, 'layer': 'Linear'}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

Multiple classifier heads.

This head inputs feature maps from different stages of backbone, average pools each feature map to around 9000 dimensions, and then appends a linear classifier at each stage to predict corresponding class scores.

参数
  • backbone (str) – Specify which backbone to use, only support ResNet50. Defaults to ‘resnet50’.

  • in_indices (Sequence[int]) – Input from which stages. Defaults to (0, 1, 2, 3, 4).

  • pool_type (str) – ‘adaptive’ or ‘specified’. If set to ‘adaptive’, use adaptive average pooling, otherwise use specified pooling params. Defaults to ‘adaptive’.

  • num_classes (int) – Number of classes. Defaults to 1000.

  • loss (dict) – The dict of loss information. Defaults to ‘mmcls.models.CrossEntro): Whether to unpool the features from last layer. Defaults to False.

  • cal_acc (bool) – Whether to calculate accuracy during training. If you use batch augmentations like Mixup and CutMix during training, it is pointless to calculate accuracy. Defaults to False.

  • topk (int | Tuple[int]) – Top-k accuracy. Defaults to (1, ).

  • norm_cfg (dict) – Dict to construct and config norm layer. Defaults to dict(type='BN').

  • init_cfg (dict or List[dict]) – Initialization config dict. Defaults to [ dict(type='Normal', std=0.01, layer='Linear'), dict(type='Constant', val=1, layer=['_BatchNorm', 'GroupNorm']) ]

forward(feats: Union[list, tuple])list[源代码]

Compute multi-head scores.

参数

feats (Sequence[torch.Tensor]) – Feature maps of backbone, each tensor has shape (N, C, H, W).

返回

A list of class scores.

返回类型

List[torch.Tensor]

loss(feats: Sequence[torch.Tensor], data_samples: List[mmcls.structures.cls_data_sample.ClsDataSample], **kwargs)dict[源代码]

Calculate losses from the extracted features.

参数
  • x (Sequence[torch.Tensor]) – Feature maps of backbone, each tensor has shape (N, C, H, W).

  • gt_label (torch.Tensor) – The ground truth label.

返回

Dict of loss and accuracy.

返回类型

Dict[str, torch.Tensor]

predict(feats: Sequence[torch.Tensor], data_samples: List[mmcls.structures.cls_data_sample.ClsDataSample])List[mmcls.structures.cls_data_sample.ClsDataSample][源代码]

Inference without augmentation.

参数
  • feats (tuple[Tensor]) – The extracted features.

  • data_samples (List[BaseDataElement], optional) – The annotation data of every samples. If not None, set pred_label of the input data samples.

返回

The data samples containing annotation,

prediction, etc.

返回类型

List[BaseDataElement]

class mmselfsup.models.heads.SimMIMHead(patch_size: int, loss: dict)[源代码]

Pretrain Head for SimMIM.

参数
  • patch_size (int) – Patch size of each token.

  • loss (dict) – The config for loss.

forward(pred: torch.Tensor, target: torch.Tensor, mask: torch.Tensor)torch.Tensor[源代码]

Forward function of MAE Loss.

This method will expand mask to the size of the original image.

参数
  • pred (torch.Tensor) – The reconstructed image.

  • target (torch.Tensor) – The target image.

  • mask (torch.Tensor) – The mask of the target image.

返回

The reconstruction loss.

返回类型

torch.Tensor

class mmselfsup.models.heads.SwAVHead(loss: dict)[源代码]

Head for SwAV.

参数

loss (dict) – Config dict for module of loss functions.

forward(pred: torch.Tensor)torch.Tensor[源代码]

Forward function of SwAV head.

参数

pred (torch.Tensor) – NxC input features.

返回

The SwAV loss.

返回类型

torch.Tensor

losses

class mmselfsup.models.losses.BEiTLoss[源代码]

Loss function for BEiT.

The BEiTLoss supports 2 diffenrent logits shared 1 target, like BEiT v2.

forward(logits: Union[Tuple[torch.Tensor], torch.Tensor], target: torch.Tensor)Tuple[torch.Tensor, torch.Tensor][源代码]

Forward function of BEiT Loss.

参数
  • logits (torch.Tensor) – The outputs from the decoder.

  • target (torch.Tensor) – The targets generated by dalle.

返回

The main loss.

返回类型

Tuple[torch.Tensor, torch.Tensor]

class mmselfsup.models.losses.CAELoss(lambd: float)[源代码]

Loss function for CAE.

Compute the align loss and the main loss.

参数

lambd (float) – The weight for the align loss.

forward(logits: torch.Tensor, target: torch.Tensor, latent_pred: torch.Tensor, latent_target: torch.Tensor)Tuple[torch.Tensor, torch.Tensor][源代码]

Forward function of CAE Loss.

参数
  • logits (torch.Tensor) – The outputs from the decoder.

  • target (torch.Tensor) – The targets generated by dalle.

  • latent_pred (torch.Tensor) – The latent prediction from the regressor.

  • latent_target (torch.Tensor) – The latent target from the teacher network.

返回

The main loss and align loss.

返回类型

Tuple[torch.Tensor, torch.Tensor]

class mmselfsup.models.losses.CosineSimilarityLoss(shift_factor: float = 0.0, scale_factor: float = 1.0)[源代码]

Cosine similarity loss function.

Compute the similarity between two features and optimize that similarity as loss.

参数
  • shift_factor (float) – The shift factor of cosine similarity. Default: 0.0.

  • scale_factor (float) – The scale factor of cosine similarity. Default: 1.0.

forward(pred: torch.Tensor, target: torch.Tensor, mask: Optional[torch.Tensor] = None)torch.Tensor[源代码]

Forward function of cosine similarity loss.

参数
  • pred (torch.Tensor) – The predicted features.

  • target (torch.Tensor) – The target features.

返回

The cosine similarity loss.

返回类型

torch.Tensor

class mmselfsup.models.losses.CrossCorrelationLoss(lambd: float = 0.0051)[源代码]

Cross correlation loss function.

Compute the on-diagnal and off-diagnal loss.

参数

lambd (float) – The weight for the off-diag loss.

forward(cross_correlation_matrix: torch.Tensor)torch.Tensor[源代码]

Forward function of cross correlation loss.

参数

cross_correlation_matrix (torch.Tensor) – The cross correlation matrix.

返回

cross correlation loss.

返回类型

torch.Tensor

off_diagonal(x: torch.Tensor)torch.Tensor[源代码]

Rreturn a flattened view of the off-diagonal elements of a square matrix.

class mmselfsup.models.losses.MAEReconstructionLoss[源代码]

Loss function for MAE.

Compute the loss in masked region.

forward(pred: torch.Tensor, target: torch.Tensor, mask: torch.Tensor)torch.Tensor[源代码]

Forward function of MAE Loss.

参数
  • pred (torch.Tensor) – The reconstructed image.

  • target (torch.Tensor) – The target image.

  • mask (torch.Tensor) – The mask of the target image.

返回

The reconstruction loss.

返回类型

torch.Tensor

class mmselfsup.models.losses.PixelReconstructionLoss(criterion: str, channel: Optional[int] = None)[源代码]

Loss for the reconstruction of pixel in Masked Image Modeling.

This module measures the distance between the target image and the reconstructed image and compute the loss to optimize the model. Currently, This module only provides L1 and L2 loss to penalize the reconstructed error. In addition, a mask can be passed in the forward function to only apply loss on visible region, like that in MAE.

参数
  • criterion (str) – The loss the penalize the reconstructed error. Currently, only supports L1 and L2 loss

  • channel (int, optional) – The number of channels to average the reconstruction loss. If not None, the reconstruction loss will be divided by the channel. Defaults to None.

forward(pred: torch.Tensor, target: torch.Tensor, mask: Optional[torch.Tensor] = None)torch.Tensor[源代码]

Forward function to compute the reconstrction loss.

参数
  • pred (torch.Tensor) – The reconstructed image.

  • target (torch.Tensor) – The target image.

  • mask (torch.Tensor) – The mask of the target image.

返回

The reconstruction loss.

返回类型

torch.Tensor

class mmselfsup.models.losses.SimMIMReconstructionLoss(encoder_in_channels: int)[源代码]

Loss function for MAE.

Compute the loss in masked region.

参数

encoder_in_channels (int) – Number of input channels for encoder.

forward(pred: torch.Tensor, target: torch.Tensor, mask: torch.Tensor)torch.Tensor[源代码]

Forward function of MAE Loss.

参数
  • pred (torch.Tensor) – The reconstructed image.

  • target (torch.Tensor) – The target image.

  • mask (torch.Tensor) – The mask of the target image.

返回

The reconstruction loss.

返回类型

torch.Tensor

class mmselfsup.models.losses.SwAVLoss(feat_dim: int, sinkhorn_iterations: int = 3, epsilon: float = 0.05, temperature: float = 0.1, crops_for_assign: List[int] = [0, 1], num_crops: List[int] = [2], num_prototypes: int = 3000, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

The Loss for SwAV.

This Loss contains clustering and sinkhorn algorithms to compute Q codes. Part of the code is borrowed from script. The queue is built in engine/hooks/swav_hook.py.

参数
  • feat_dim (int) – feature dimension of the prototypes.

  • sinkhorn_iterations (int) – number of iterations in Sinkhorn-Knopp algorithm. Defaults to 3.

  • epsilon (float) – regularization parameter for Sinkhorn-Knopp algorithm. Defaults to 0.05.

  • temperature (float) – temperature parameter in training loss. Defaults to 0.1.

  • crops_for_assign (List[int]) – list of crops id used for computing assignments. Defaults to [0, 1].

  • num_crops (List[int]) – list of number of crops. Defaults to [2].

  • num_prototypes (int) – number of prototypes. Defaults to 3000.

  • init_cfg (dict or List[dict], optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor)torch.Tensor[源代码]

Forward function of SwAV loss.

参数

x (torch.Tensor) – NxC input features.

返回

The returned loss.

返回类型

torch.Tensor

memories

class mmselfsup.models.memories.ODCMemory(length: int, feat_dim: int, momentum: float, num_classes: int, min_cluster: int, **kwargs)[源代码]

Memory module for ODC.

This module includes the samples memory and the centroids memory in ODC. The samples memory stores features and pseudo-labels of all samples in the dataset; while the centroids memory stores features of cluster centroids.

参数
  • length (int) – Number of features stored in the samples memory.

  • feat_dim (int) – Dimension of stored features.

  • momentum (float) – Momentum coefficient for updating features.

  • num_classes (int) – Number of clusters.

  • min_cluster (int) – Minimal cluster size.

deal_with_small_clusters()None[源代码]

Deal with small clusters.

init_memory(feature: numpy.ndarray, label: numpy.ndarray)None[源代码]

Initialize memory modules.

update_centroids_memory(cinds: Optional[List] = None)None[源代码]

Update centroids memory.

update_samples_memory(idx: torch.Tensor, feature: torch.Tensor)torch.Tensor[源代码]

Update samples memory.

class mmselfsup.models.memories.SimpleMemory(length: int, feat_dim: int, momentum: float, **kwargs)[源代码]

Simple feature memory bank.

This module includes the memory bank that stores running average features of all samples in the dataset. It is used in algorithms like NPID.

参数
  • length (int) – Number of features stored in the memory bank.

  • feat_dim (int) – Dimension of stored features.

  • momentum (float) – Momentum coefficient for updating features.

update(idx: torch.Tensor, feature: torch.Tensor)None[源代码]

Update features in the memory bank.

参数
  • idx (torch.Tensor) – Indices for the batch of features.

  • feature (torch.Tensor) – Batch of features.

target_generators

class mmselfsup.models.target_generators.CLIPGenerator(tokenizer_path: str)[源代码]

Get the features and attention from the last layer of CLIP.

This module is used to generate target features in masked image modeling.

参数

tokenizer_path (str) – The path of the checkpoint of CLIP.

forward(x: torch.Tensor)Tuple[torch.Tensor, torch.Tensor][源代码]

Get the features and attention from the last layer of CLIP.

参数

x (torch.Tensor) – The input image, which is of shape (N, 3, H, W).

返回

The features and attention from the last layer of CLIP, which are of shape (N, L, C) and (N, L, L), respectively.

返回类型

Tuple[torch.Tensor, torch.Tensor]

class mmselfsup.models.target_generators.Encoder(n_hid: int = 256, n_blk_per_group: int = 2, input_channels: int = 3, vocab_size: int = 8192, device: torch.device = device(type='cpu'), requires_grad: bool = False, use_mixed_precision: bool = True, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]
forward(x: torch.Tensor)torch.Tensor[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.target_generators.HOGGenerator(nbins: int = 9, pool: int = 8, gaussian_window: int = 16)[源代码]

Generate HOG feature for images.

This module is used in MaskFeat to generate HOG feature. The code is modified from file slowfast/models/operators.py. Here is the link of HOG wikipedia.

参数
  • nbins (int) – Number of bin. Defaults to 9.

  • pool (float) – Number of cell. Defaults to 8.

  • gaussian_window (int) – Size of gaussian kernel. Defaults to 16.

forward(x: torch.Tensor)torch.Tensor[源代码]

Generate hog feature for each batch images.

参数

x (torch.Tensor) – Input images of shape (N, 3, H, W).

返回

Hog features.

返回类型

torch.Tensor

generate_hog_image(hog_out: torch.Tensor)numpy.ndarray[源代码]

Generate HOG image according to HOG features.

get_gaussian_kernel(kernlen: int, std: int)torch.Tensor[源代码]

Returns a 2D Gaussian kernel array.

class mmselfsup.models.target_generators.LowFreqTargetGenerator(radius: int, img_size: Union[int, Tuple[int, int]])[源代码]

Generate low-frquency target for images.

This module is used in PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling to remove these high-frequency information from images.

参数
  • radius (int) – radius of low pass filter.

  • img_size (Union[int, Tuple[int, int]]) – size of input images.

forward(imgs: torch.Tensor)torch.Tensor[源代码]

Filter out these high frequency components from images.

参数

imgs (torch.Tensor) – input images, which has shape (N, C, H, W).

返回

low frequency target, which has the same shape as

input images.

返回类型

torch.Tensor

class mmselfsup.models.target_generators.VQKD(encoder_config: dict, decoder_config: Optional[dict] = None, num_embed: int = 8192, embed_dims: int = 32, decay: float = 0.99, beta: float = 1.0, quantize_kmeans_init: bool = True, init_cfg: Optional[dict] = None)[源代码]

Vector-Quantized Knowledge Distillation.

The module only contains encoder and VectorQuantizer part Modified from https://github.com/microsoft/unilm/blob/master/beit2/modeling_vqkd.py

参数
  • encoder_config (dict) – The config of encoder.

  • decoder_config (dict, optional) – The config of decoder. Currently, VQKD only support to build encoder. Defaults to None.

  • num_embed (int) – Number of embedding vectors in the codebook. Defaults to 8192.

  • embed_dims (int) – The dimension of embedding vectors in the codebook. Defaults to 32.

  • decay (float) – The decay parameter of EMA. Defaults to 0.99.

  • beta (float) – The mutiplier for VectorQuantizer loss. Defaults to 1.

  • quantize_kmeans_init (bool) – Whether to use k-means to initialize the VectorQuantizer. Defaults to True.

  • init_cfg (dict or List[dict], optional) – Initialization config dict. Defaults to None.

encode(x: torch.Tensor)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][源代码]

Encode the input images and get corresponding results.

forward(x: torch.Tensor)torch.Tensor[源代码]

The forward function.

Currently, only support to get tokens.

get_tokens(x: torch.Tensor)dict[源代码]

Get tokens for beit pre-training.

utils

class mmselfsup.models.utils.CAEDataPreprocessor(mean: Optional[Sequence[Union[int, float]]] = None, std: Optional[Sequence[Union[int, float]]] = None, pad_size_divisor: int = 1, pad_value: Union[float, int] = 0, bgr_to_rgb: bool = False, rgb_to_bgr: bool = False, non_blocking: Optional[bool] = False)[源代码]

Image pre-processor for CAE.

Compared with the mmselfsup.SelfSupDataPreprocessor, this module will normalize the prediction image and target image with different normalization parameters.

forward(data: dict, training: bool = False)Tuple[List[torch.Tensor], Optional[list]][源代码]

Performs normalization、padding and bgr2rgb conversion based on BaseDataPreprocessor.

参数
  • data (dict) – data sampled from dataloader.

  • training (bool) – Whether to enable training time augmentation. If subclasses override this method, they can perform different preprocessing strategies for training and testing based on the value of training.

返回

Data in the same format as the model input.

返回类型

Tuple[torch.Tensor, Optional[list]]

class mmselfsup.models.utils.CAETransformerRegressorLayer(embed_dims: int, num_heads: int, feedforward_channels: int, num_fcs: int = 2, qkv_bias: bool = False, qk_scale: Optional[float] = None, drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, init_values: float = 0.0, act_cfg: dict = {'type': 'GELU'}, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'})[源代码]

Transformer layer for the regressor of CAE.

This module is different from conventional transformer encoder layer, for its queries are the masked tokens, but its keys and values are the concatenation of the masked and unmasked tokens.

参数
  • embed_dims (int) – The feature dimension.

  • num_heads (int) – The number of heads in multi-head attention.

  • feedforward_channels (int) – The hidden dimension of FFNs. Defaults: 1024.

  • num_fcs (int, optional) – The number of fully-connected layers in FFNs. Default: 2.

  • qkv_bias (bool) – If True, add a learnable bias to q, k, v. Defaults to True.

  • qk_scale (float, optional) – Override default qk scale of head_dim ** -0.5 if set. Defaults to None.

  • drop_rate (float) – The dropout rate. Defaults to 0.0.

  • attn_drop_rate (float) – The drop out rate for attention output weights. Defaults to 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults to 0.

  • init_values (float) – The init values of gamma. Defaults to 0.0.

  • act_cfg (dict) – The activation config for FFNs. Defaluts to dict(type='GELU').

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

forward(x_q: torch.Tensor, x_kv: torch.Tensor, pos_q: torch.Tensor, pos_k: torch.Tensor)torch.Tensor[源代码]

Forward function.

class mmselfsup.models.utils.CosineEMA(model: torch.nn.modules.module.Module, momentum: float = 0.996, end_momentum: float = 1.0, interval: int = 1, device: Optional[torch.device] = None, update_buffers: bool = False)[源代码]

CosineEMA is implemented for updating momentum parameter, used in BYOL, MoCoV3, etc.

The momentum parameter is updated with cosine annealing, including momentum adjustment following:

\[m = m_1 - (m_1 - m_0) * (cos(pi * k / K) + 1) / 2\]

where \(k\) is the current step, \(K\) is the total steps.

参数
  • model (nn.Module) – The model to be averaged.

  • momentum (float) – The momentum used for updating ema parameter. Ema’s parameter are updated with the formula: averaged_param = momentum * averaged_param + (1-momentum) * source_param. Defaults to 0.996.

  • end_momentum (float) – The end momentum value for cosine annealing. Defaults to 1.

  • interval (int) – Interval between two updates. Defaults to 1.

  • device (torch.device, optional) – If provided, the averaged model will be stored on the device. Defaults to None.

  • update_buffers (bool) – if True, it will compute running averages for both the parameters and the buffers of the model. Defaults to False.

avg_func(averaged_param: torch.Tensor, source_param: torch.Tensor, steps: int)None[源代码]

Compute the moving average of the parameters using the cosine momentum strategy.

参数
  • averaged_param (Tensor) – The averaged parameters.

  • source_param (Tensor) – The source parameters.

  • steps (int) – The number of times the parameters have been updated.

返回

The averaged parameters.

返回类型

Tensor

class mmselfsup.models.utils.Extractor(extract_dataloader: Union[torch.utils.data.dataloader.DataLoader, dict], seed: Optional[int] = None, dist_mode: bool = False, pool_cfg: Optional[dict] = None, **kwargs)[源代码]

Feature extractor.

The extractor support to build its own DataLoader, customized models, pooling type. It also has distributed and non-distributed mode.

参数
  • extract_dataloader (dict) – A dict to build DataLoader object.

  • seed (int, optional) – Random seed. Defaults to None.

  • dist_mode (bool) – Use distributed extraction or not. Defaults to False.

  • pool_cfg (dict, optional) – The configs of pooling. Defaults to dict(type=’AvgPool2d’, output_size=1).

class mmselfsup.models.utils.GatherLayer(*args, **kwargs)[源代码]

Gather tensors from all process, supporting backward propagation.

static backward(ctx: Any, *grads: torch.Tensor)torch.Tensor[源代码]

Defines a formula for differentiating the operation with backward mode automatic differentiation (alias to the vjp function).

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computated w.r.t. the output.

static forward(ctx: Any, input: torch.Tensor)Tuple[List][源代码]

Performs the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with ctx.save_for_backward() if they are intended to be used in backward (equivalently, vjp) or ctx.save_for_forward() if they are intended to be used for in jvp.

class mmselfsup.models.utils.MultiPooling(pool_type: str = 'adaptive', in_indices: tuple = (0), backbone: str = 'resnet50')[源代码]

Pooling layers for features from multiple depth.

参数
  • pool_type (str) – Pooling type for the feature map. Options are ‘adaptive’ and ‘specified’. Defaults to ‘adaptive’.

  • in_indices (Sequence[int]) – Output from which backbone stages. Defaults to (0, ).

  • backbone (str) – The selected backbone. Defaults to ‘resnet50’.

forward(x: Union[List, Tuple])None[源代码]

Forward function.

class mmselfsup.models.utils.MultiPrototypes(output_dim: int, num_prototypes: List[int])[源代码]

Multi-prototypes for SwAV head.

参数
  • output_dim (int) – The output dim from SwAV neck.

  • num_prototypes (List[int]) – The number of prototypes needed.

forward(x: torch.Tensor)List[torch.Tensor][源代码]

Run forward for every prototype.

class mmselfsup.models.utils.MultiheadAttention(embed_dims: int, num_heads: int, input_dims: Optional[int] = None, attn_drop: float = 0.0, proj_drop: float = 0.0, qkv_bias: bool = True, qk_scale: Optional[float] = None, proj_bias: bool = True, init_cfg: Optional[dict] = None)[源代码]

Multi-head Attention Module.

This module rewrite the MultiheadAttention by replacing qkv bias with customized qkv bias, in addition to removing the drop path layer.

参数
  • embed_dims (int) – The embedding dimension.

  • num_heads (int) – Parallel attention heads.

  • input_dims (int, optional) – The input dimension, and if None, use embed_dims. Defaults to None.

  • attn_drop (float) – Dropout rate of the dropout layer after the attention calculation of query and key. Defaults to 0.

  • proj_drop (float) – Dropout rate of the dropout layer after the output projection. Defaults to 0.

  • dropout_layer (dict) – The dropout config before adding the shortcut. Defaults to dict(type='Dropout', drop_prob=0.).

  • qkv_bias (bool) – If True, add a learnable bias to q, k, v. Defaults to True.

  • qk_scale (float, optional) – Override default qk scale of head_dim ** -0.5 if set. Defaults to None.

  • proj_bias (bool) – Defaults to True.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x: torch.Tensor)torch.Tensor[源代码]

Forward function.

class mmselfsup.models.utils.NormEMAVectorQuantizer(num_embed: int, embed_dims: int, beta: float, decay: float = 0.99, statistic_code_usage: bool = True, kmeans_init: bool = True, codebook_init_path: Optional[str] = None)[源代码]

Normed EMA vector quantizer module.

参数
  • num_embed (int) – Number of embedding vectors in the codebook. Defaults to 8192.

  • embed_dims (int) – The dimension of embedding vectors in the codebook. Defaults to 32.

  • beta (float) – The mutiplier for VectorQuantizer embedding loss. Defaults to 1.

  • decay (float) – The decay parameter of EMA. Defaults to 0.99.

  • statistic_code_usage (bool) – Whether to use cluster_size to record statistic. Defaults to True.

  • kmeans_init (bool) – Whether to use k-means to initialize the VectorQuantizer. Defaults to True.

  • codebook_init_path (str) – The initialization checkpoint for codebook. Defaults to None.

forward(z)[源代码]

Forward function.

class mmselfsup.models.utils.PromptTransformerEncoderLayer(embed_dims: int, num_heads: int, feedforward_channels=<class 'int'>, drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, num_fcs: int = 2, qkv_bias: bool = True, act_cfg: dict = {'type': 'GELU'}, norm_cfg: dict = {'type': 'LN'}, init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

Prompt Transformer Encoder Layer for MILAN.

This module is specific for the prompt encoder in MILAN. It will not update the visible tokens from the encoder.

参数
  • embed_dims (int) – The feature dimension.

  • num_heads (int) – Parallel attention heads.

  • feedforward_channels (int) – The hidden dimension for FFNs.

  • drop_rate (float) – Probability of an element to be zeroed after the feed forward layer. Defaults to 0.0.

  • attn_drop_rate (float) – The drop out rate for attention layer. Defaults to 0.0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults to 0.0.

  • num_fcs (int) – The number of fully-connected layers for FFNs. Defaults to 2.

  • qkv_bias (bool) – Enable bias for qkv if True. Defaults to True.

  • act_cfg (dict) – The activation config for FFNs. Defaluts to dict(type='GELU').

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • batch_first (bool) – Key, Query and Value are shape of (batch, n, embed_dim) or (n, batch, embed_dim). Defaults to False.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x: torch.Tensor, visible_tokens: torch.Tensor, ids_restore: torch.Tensor)torch.Tensor[源代码]

Forward function for PromptMultiheadAttention.

参数
  • x (torch.Tensor) – Mask token features with shape N x L_m x C.

  • visible_tokens (torch.Tensor) – The visible tokens features from encoder with shape N x L_v x C.

  • ids_restore (torch.Tensor) – The ids of all tokens in the original image with shape N x L.

返回

Output features with shape N x L x C.

返回类型

torch Tensor

class mmselfsup.models.utils.RelativeLocDataPreprocessor(mean: Optional[Sequence[Union[int, float]]] = None, std: Optional[Sequence[Union[int, float]]] = None, pad_size_divisor: int = 1, pad_value: Union[float, int] = 0, bgr_to_rgb: bool = False, rgb_to_bgr: bool = False, non_blocking: Optional[bool] = False)[源代码]

Image pre-processor for Relative Location.

forward(data: dict, training: bool = False)Tuple[List[torch.Tensor], Optional[list]][源代码]

Performs normalization、padding and bgr2rgb conversion based on BaseDataPreprocessor.

参数
  • data (dict) – data sampled from dataloader.

  • training (bool) – Whether to enable training time augmentation. If subclasses override this method, they can perform different preprocessing strategies for training and testing based on the value of training.

返回

Data in the same format as the model input.

返回类型

Tuple[torch.Tensor, Optional[list]]

class mmselfsup.models.utils.RotationPredDataPreprocessor(mean: Optional[Sequence[Union[int, float]]] = None, std: Optional[Sequence[Union[int, float]]] = None, pad_size_divisor: int = 1, pad_value: Union[float, int] = 0, bgr_to_rgb: bool = False, rgb_to_bgr: bool = False, non_blocking: Optional[bool] = False)[源代码]

Image pre-processor for Relative Location.

forward(data: dict, training: bool = False)Tuple[List[torch.Tensor], Optional[list]][源代码]

Performs normalization、padding and bgr2rgb conversion based on BaseDataPreprocessor.

参数
  • data (dict) – data sampled from dataloader.

  • training (bool) – Whether to enable training time augmentation. If subclasses override this method, they can perform different preprocessing strategies for training and testing based on the value of training.

返回

Data in the same format as the model input.

返回类型

Tuple[torch.Tensor, Optional[list]]

class mmselfsup.models.utils.SelfSupDataPreprocessor(mean: Optional[Sequence[Union[int, float]]] = None, std: Optional[Sequence[Union[int, float]]] = None, pad_size_divisor: int = 1, pad_value: Union[float, int] = 0, bgr_to_rgb: bool = False, rgb_to_bgr: bool = False, non_blocking: Optional[bool] = False)[源代码]

Image pre-processor for operations, like normalization and bgr to rgb.

Compared with the mmengine.ImgDataPreprocessor, this module treats each item in inputs of input data as a list, instead of torch.Tensor.

forward(data: dict, training: bool = False)Tuple[List[torch.Tensor], Optional[list]][源代码]

Performs normalization、padding and bgr2rgb conversion based on BaseDataPreprocessor.

参数
  • data (dict) – data sampled from dataloader.

  • training (bool) – Whether to enable training time augmentation. If subclasses override this method, they can perform different preprocessing strategies for training and testing based on the value of training.

返回

Data in the same format as the model input.

返回类型

Tuple[torch.Tensor, Optional[list]]

class mmselfsup.models.utils.Sobel[源代码]

Sobel layer.

The layer reduces channels from 3 to 2.

forward(x: torch.Tensor)torch.Tensor[源代码]

Run sobel layer.

class mmselfsup.models.utils.TransformerEncoderLayer(embed_dims: int, num_heads: int, feedforward_channels: int, window_size: Optional[int] = None, drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, num_fcs: int = 2, qkv_bias: bool = True, act_cfg: dict = {'type': 'GELU'}, norm_cfg: dict = {'type': 'LN'}, init_values: float = 0.0, init_cfg: Optional[dict] = None)[源代码]

Implements one encoder layer in Vision Transformer.

This module is the rewritten version of the TransformerEncoderLayer in MMClassification by adding the gamma and relative position bias in Attention module.

参数
  • embed_dims (int) – The feature dimension.

  • num_heads (int) – Parallel attention heads

  • feedforward_channels (int) – The hidden dimension for FFNs

  • drop_rate (float) – Probability of an element to be zeroed after the feed forward layer. Defaults to 0.

  • attn_drop_rate (float) – The drop out rate for attention output weights. Defaults to 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults to 0.

  • num_fcs (int) – The number of fully-connected layers for FFNs. Defaults to 2.

  • qkv_bias (bool) – enable bias for qkv if True. Defaults to True.

  • act_cfg (dict) – The activation config for FFNs. Defaluts to dict(type='GELU').

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • init_values (float) – The init values of gamma. Defaults to 0.0.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor)torch.Tensor[源代码]

Forward function.

class mmselfsup.models.utils.TwoNormDataPreprocessor(mean: Optional[Sequence[Union[int, float]]] = None, std: Optional[Sequence[Union[int, float]]] = None, second_mean: Optional[Sequence[Union[float, int]]] = None, second_std: Optional[Sequence[Union[float, int]]] = None, pad_size_divisor: int = 1, pad_value: Union[float, int] = 0, bgr_to_rgb: bool = False, rgb_to_bgr: bool = False, non_blocking: Optional[bool] = False)[源代码]

Image pre-processor for CAE, BEiT v1/v2, etc.

Compared with the mmselfsup.SelfSupDataPreprocessor, this module will normalize the prediction image and target image with different normalization parameters.

参数
  • mean (Sequence[float or int], optional) – The pixel mean of image channels. If bgr_to_rgb=True it means the mean value of R, G, B channels. If the length of mean is 1, it means all channels have the same mean value, or the input is a gray image. If it is not specified, images will not be normalized. Defaults None.

  • std (Sequence[float or int], optional) – The pixel standard deviation of image channels. If bgr_to_rgb=True it means the standard deviation of R, G, B channels. If the length of std is 1, it means all channels have the same standard deviation, or the input is a gray image. If it is not specified, images will not be normalized. Defaults None.

  • second_mean (Sequence[float or int], optional) – The description is like mean, it can be customized for targe image. Defaults None.

  • second_std (Sequence[float or int], optional) – The description is like std, it can be customized for targe image. Defaults None.

  • pad_size_divisor (int) – The size of padded image should be divisible by pad_size_divisor. Defaults to 1.

  • pad_value (float or int) – The padded pixel value. Defaults to 0.

  • bgr_to_rgb (bool) – whether to convert image from BGR to RGB. Defaults to False.

  • rgb_to_bgr (bool) – whether to convert image from RGB to RGB. Defaults to False.

  • non_blocking (bool) – Whether block current process when transferring data to device.

forward(data: dict, training: bool = False)Tuple[List[torch.Tensor], Optional[list]][源代码]

Performs normalization、padding and bgr2rgb conversion based on BaseDataPreprocessor.

参数
  • data (dict) – data sampled from dataloader.

  • training (bool) – Whether to enable training time augmentation. If subclasses override this method, they can perform different preprocessing strategies for training and testing based on the value of training.

返回

Data in the same format as the

model input.

返回类型

Tuple[torch.Tensor, Optional[list]]

class mmselfsup.models.utils.VideoDataPreprocessor(mean: Optional[Sequence[Union[int, float]]] = None, std: Optional[Sequence[Union[int, float]]] = None, pad_size_divisor: int = 1, pad_value: Union[float, int] = 0, bgr_to_rgb: bool = False, format_shape: str = 'NCHW')[源代码]

Video pre-processor for operations, like normalization and bgr to rgb conversion .

Compared with the mmaction.ActionDataPreprocessor, this module treats each item in inputs of input data as a list, instead of torch.Tensor.

参数
  • mean (Sequence[float or int, optional) – The pixel mean of channels of images or stacked optical flow. Defaults to None.

  • std (Sequence[float or int], optional) – The pixel standard deviation of channels of images or stacked optical flow. Defaults to None.

  • pad_size_divisor (int) – The size of padded image should be divisible by pad_size_divisor. Defaults to 1.

  • pad_value (float or int) – The padded pixel value. Defaults to 0.

  • bgr_to_rgb (bool) – Whether to convert image from BGR to RGB. Defaults to False.

  • format_shape (str) – Format shape of input data. Defaults to 'NCHW'.

forward(data: dict, training: bool = False)Tuple[List[torch.Tensor], Optional[list]][源代码]

Performs normalization、padding and bgr2rgb conversion based on BaseDataPreprocessor.

参数
  • data (dict) – data sampled from dataloader.

  • training (bool) – Whether to enable training time augmentation. If subclasses override this method, they can perform different preprocessing strategies for training and testing based on the value of training.

返回

Data in the same format

as the model input.

返回类型

Tuple[List[torch.Tensor], Optional[list]]

mmselfsup.models.utils.build_2d_sincos_position_embedding(patches_resolution: Union[int, Sequence[int]], embed_dims: int, temperature: Optional[int] = 10000.0, cls_token: Optional[bool] = False)torch.Tensor[源代码]

The function is to build position embedding for model to obtain the position information of the image patches.

参数
  • patches_resolution (Union[int, Sequence[int]]) – The resolution of each patch.

  • embed_dims (int) – The dimension of the embedding vector.

  • temperature (int, optional) – The temperature parameter. Defaults to 10000.

  • cls_token (bool, optional) – Whether to concatenate class token. Defaults to False.

返回

The position embedding vector.

返回类型

torch.Tensor

mmselfsup.models.utils.build_clip_model(state_dict: dict, finetune: bool = False, average_targets: int = 1)torch.nn.modules.module.Module[源代码]

Build the CLIP model.

参数
  • state_dict (dict) – The pretrained state dict.

  • finetune (bool) – Whether to fineturn the model.

  • average_targets (bool) – Whether to average the target.

返回

The CLIP model.

返回类型

nn.Module

mmselfsup.structures

class mmselfsup.structures.SelfSupDataSample(*, metainfo: Optional[dict] = None, **kwargs)[源代码]

A data structure interface of MMSelfSup. They are used as interfaces between different components.

Meta field:

  • img_shape (Tuple): The shape of the corresponding input image. Used for visualization.

  • ori_shape (Tuple): The original shape of the corresponding image. Used for visualization.

  • img_path (str): The path of original image.

Data field:

  • gt_label (LabelData): The ground truth label of an image.

  • sample_idx (InstanceData): The idx of an image in the dataset.

  • mask (BaseDataElement): Mask used in masks image modeling.

  • pred_label (LabelData): The predicted label.

  • pseudo_label (InstanceData): Label used in pretext task, e.g. Relative Location.

实际案例

>>> import torch
>>> import numpy as np
>>> from mmengine.structure import InstanceData
>>> from mmselfsup.structures import SelfSupDataSample
>>> data_sample = SelfSupDataSample()
>>> gt_label = LabelData()
>>> gt_label.value = [1]
>>> data_sample.gt_label = gt_label
>>> len(data_sample.gt_label)
1
>>> print(data_sample)
<SelfSupDataSample(
    META INFORMATION
    DATA FIELDS
    gt_label: <InstanceData(
            META INFORMATION
            DATA FIELDS
            value: [1]
        ) at 0x7f15c08f9d10>
    _gt_label: <InstanceData(
            META INFORMATION
            DATA FIELDS
            value: [1]
        ) at 0x7f15c08f9d10>
 ) at 0x7f15c077ef10>
>>> idx = InstanceData()
>>> idx.value = [0]
>>> data_sample = SelfSupDataSample(idx=idx)
>>> assert 'idx' in data_sample
>>> data_sample = SelfSupDataSample()
>>> mask = dict(value=np.random.rand(48, 48))
>>> mask = PixelData(**mask)
>>> data_sample.mask = mask
>>> assert 'mask' in data_sample
>>> assert 'value' in data_sample.mask
>>> data_sample = SelfSupDataSample()
>>> pred_label = dict(pred_label=[3])
>>> pred_label = LabelData(**pred_label)
>>> data_sample.pred_label = pred_label
>>> print(data_sample)
<SelfSupDataSample(
    META INFORMATION
    DATA FIELDS
    _pred_label: <InstanceData(
            META INFORMATION
            DATA FIELDS
            pred_label: [3]
        ) at 0x7f15c06a3990>
    pred_label: <InstanceData(
            META INFORMATION
            DATA FIELDS
            pred_label: [3]
        ) at 0x7f15c06a3990>
) at 0x7f15c07b8bd0>

mmselfsup.visualization

class mmselfsup.visualization.SelfSupVisualizer(name: str = 'visualizer', image: Optional[numpy.ndarray] = None, vis_backends: Optional[List[Dict]] = None, save_dir: Optional[str] = None, line_width: Union[int, float] = 3, alpha: Union[int, float] = 0.8)[源代码]

MMSelfSup Visualizer.

参数
  • name (str) – Name of the instance. Defaults to ‘visualizer’.

  • image (np.ndarray, optional) – the origin image to draw. The format should be RGB. Defaults to None.

  • vis_backends (list, optional) – Visual backend config list. Defaults to None.

  • save_dir (str, optional) – Save file dir for all storage backends. If it is None, the backend storage will not save any data.

  • line_width (int, float) – The linewidth of lines. Defaults to 3.

  • alpha (int, float) – The transparency of boxes or mask. Defaults to 0.8.

实际案例

>>> import numpy as np
>>> import torch
>>> from mmengine.structures import InstanceData
>>> from mmselfsup.structures import SelfSupDataSample
>>> from mmselfsup.visualization import SelfSupVisualizer
>>> selfsup_visualizer = SelfSupVisualizer()
>>> image = np.random.randint(0, 256,
...                     size=(10, 12, 3)).astype('uint8')
>>> pseudo_label = InstanceData()
>>> pseudo_label.patch_box = torch.Tensor([[1, 2, 2, 5]])
>>> gt_selfsup_data_sample = SelfSupDataSample()
>>> gt_selfsup_data_sample.pseudo_label = pseudo_label
>>> selfsup_visualizer.add_datasample('image', image,
...                         gt_selfsup_data_sample)
>>> selfsup_visualizer.add_datasample(
...                       'image', image, gt_selfsup_data_sample,
...                        out_file='out_file.jpg')
>>> selfsup_visualizer.add_datasample(
...                        'image', image, gt_selfsup_data_sample,
...                         show=True)
>>> pseudo_label = InstanceData()
>>> pseudo_label.patch_box = torch.Tensor([[1, 2, 2, 5]])
>>> pred_selfsup_data_sample = SelfSupDataSample()
>>> pred_selfsup_data_sample.pseudo_label = pseudo_label
>>> selfsup_visualizer.add_datasample('image', image,
...                         gt_selfsup_data_sample,
...                         pred_selfsup_data_sample)
add_datasample(name: str, image: numpy.ndarray, gt_sample: Optional[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample] = None, pred_sample: Optional[mmselfsup.structures.selfsup_data_sample.SelfSupDataSample] = None, draw_gt: bool = True, draw_pred: bool = True, show: bool = False, wait_time: float = 0, out_file: Optional[str] = None, step: int = 0)None[源代码]

Draw datasample and save to all backends.

  • If GT and prediction are plotted at the same time, they are displayed in a stitched image where the left image is the ground truth and the right image is the prediction.

  • If show is True, all storage backends are ignored, and the images will be displayed in a local window.

  • If out_file is specified, the drawn image will be saved to out_file. t is usually used when the display is not available.

参数
  • name (str) – The image identifier.

  • image (np.ndarray) – The image to draw.

  • gt_sample (SelfSupDataSample, optional) – GT SelfSupDataSample. Defaults to None.

  • pred_sample (SelfSupDataSample, optional) – Prediction SelfSupDataSample. Defaults to None.

  • draw_gt (bool) – Whether to draw GT SelfSupDataSample. Default to True.

  • draw_pred (bool) – Whether to draw Prediction SelfSupDataSample. Defaults to True.

  • show (bool) – Whether to display the drawn image. Default to False.

  • wait_time (float) – The interval of show (s). Defaults to 0.

  • out_file (str) – Path to output file. Defaults to None.

  • step (int) – Global step value to record. Defaults to 0.

mmselfsup.utils

class mmselfsup.utils.AliasMethod(probs: torch.Tensor)[源代码]

The alias method for sampling.

From: https://hips.seas.harvard.edu/blog/2013/03/03/the-alias-method-efficient-sampling-with-many-discrete-outcomes/

参数

probs (torch.Tensor) – Sampling probabilities.

draw(N: int)None[源代码]

Draw N samples from multinomial.

参数

N (int) – Number of samples.

返回

Samples.

返回类型

torch.Tensor

mmselfsup.utils.batch_shuffle_ddp(x: torch.Tensor)Tuple[torch.Tensor, torch.Tensor][源代码]

Batch shuffle, for making use of BatchNorm.

参数

x (torch.Tensor) – Data in each GPU.

返回

Output of shuffle operation.
  • x_gather[idx_this]: Shuffled data.

  • idx_unshuffle: Index for restoring.

返回类型

Tuple[torch.Tensor, torch.Tensor]

mmselfsup.utils.batch_unshuffle_ddp(x: torch.Tensor, idx_unshuffle: torch.Tensor)torch.Tensor[源代码]

Undo batch shuffle.

参数
  • x (torch.Tensor) – Data in each GPU.

  • idx_unshuffle (torch.Tensor) – Index for restoring.

返回

Output of unshuffle operation.

返回类型

torch.Tensor

mmselfsup.utils.collect_env()[源代码]

Collect the information of the running environments.

mmselfsup.utils.concat_all_gather(tensor: torch.Tensor)torch.Tensor[源代码]

Performs all_gather operation on the provided tensors.

参数

tensor (torch.Tensor) – Tensor to be broadcast from current process.

返回

The concatnated tensor.

返回类型

torch.Tensor

mmselfsup.utils.dist_forward_collect(func: object, data_loader: torch.utils.data.dataloader.DataLoader, length: int)dict[源代码]

Forward and collect network outputs in a distributed manner.

This function performs forward propagation and collects outputs. It can be used to collect results, features, losses, etc.

参数
  • func (function) – The function to process data.

  • data_loader (DataLoader) – the torch DataLoader to yield data.

  • length (int) – Expected length of output arrays.

返回

The collected outputs.

返回类型

Dict[str, torch.Tensor]

mmselfsup.utils.distributed_sinkhorn(out: torch.Tensor, sinkhorn_iterations: int, world_size: int, epsilon: float)torch.Tensor[源代码]

Apply the distributed sinknorn optimization on the scores matrix to find the assignments.

参数
  • out (torch.Tensor) – The scores matrix

  • sinkhorn_iterations (int) – Number of iterations in Sinkhorn-Knopp algorithm.

  • world_size (int) – The world size of the process group.

  • epsilon (float) – regularization parameter for Sinkhorn-Knopp algorithm.

返回

Output of sinkhorn algorithm.

返回类型

torch.Tensor

mmselfsup.utils.get_model(model: torch.nn.modules.module.Module)mmengine.model.base_model.base_model.BaseModel[源代码]

Get model if the input model is a model wrapper.

参数

model (nn.Module) – A model may be a model wrapper.

返回

The model without model wrapper.

返回类型

BaseModel

mmselfsup.utils.nondist_forward_collect(func: object, data_loader: torch.utils.data.dataloader.DataLoader, length: int)dict[源代码]

Forward and collect network outputs.

This function performs forward propagation and collects outputs. It can be used to collect results, features, losses, etc.

参数
  • func (function) – The function to process data.

  • data_loader (DataLoader) – the torch DataLoader to yield data.

  • length (int) – Expected length of output arrays.

返回

The concatenated outputs.

返回类型

Dict[str, torch.Tensor]

mmselfsup.utils.register_all_modules(init_default_scope: bool = True)None[源代码]

Register all modules in mmselfsup into the registries.

参数

init_default_scope (bool) – Whether initialize the mmselfsup default scope. When init_default_scope=True, the global default scope will be set to mmselfsup, and all registries will build modules from mmselfsup’s registry node. To understand more about the registry, please refer to https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/registry.md Defaults to True.

Read the Docs v: stable
Versions
latest
stable
1.x
dev-1.x
0.x
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.