Shortcuts

mmselfsup.apis

mmselfsup.core

hooks

class mmselfsup.core.hooks.DeepClusterHook(extractor, clustering, unif_sampling, reweight, reweight_pow, init_memory=False, initial=True, interval=1, dist_mode=True, data_loaders=None)[source]

Hook for DeepCluster.

This hook includes the global clustering process in DC.

Parameters
  • extractor (dict) – Config dict for feature extraction.

  • clustering (dict) – Config dict that specifies the clustering algorithm.

  • unif_sampling (bool) – Whether to apply uniform sampling.

  • reweight (bool) – Whether to apply loss re-weighting.

  • reweight_pow (float) – The power of re-weighting.

  • init_memory (bool) – Whether to initialize memory banks used in ODC. Defaults to False.

  • initial (bool) – Whether to call the hook initially. Defaults to True.

  • interval (int) – Frequency of epochs to call the hook. Defaults to 1.

  • dist_mode (bool) – Use distributed training or not. Defaults to True.

  • data_loaders (DataLoader) – A PyTorch dataloader. Defaults to None.

class mmselfsup.core.hooks.DenseCLHook(start_iters=1000, **kwargs)[source]

Hook for DenseCL.

This hook includes loss_lambda warmup in DenseCL. Borrowed from the authors’ code: https://github.com/WXinlong/DenseCL.

Parameters

start_iters (int, optional) – The number of warmup iterations to set loss_lambda=0. Defaults to 1000.

class mmselfsup.core.hooks.DistOptimizerHook(update_interval=1, grad_clip=None, coalesce=True, bucket_size_mb=- 1, frozen_layers_cfg={})[source]

Optimizer hook for distributed training.

This hook can accumulate gradients every n intervals and freeze some layers for some iters at the beginning.

Parameters
  • update_interval (int, optional) – The update interval of the weights, set > 1 to accumulate the grad. Defaults to 1.

  • grad_clip (dict, optional) – Dict to config the value of grad clip. E.g., grad_clip = dict(max_norm=10). Defaults to None.

  • coalesce (bool, optional) – Whether allreduce parameters as a whole. Defaults to True.

  • bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Defaults to -1.

  • frozen_layers_cfg (dict, optional) – Dict to config frozen layers. The key-value pair is layer name and its frozen iters. If frozen, the layer gradient would be set to None. Defaults to dict().

class mmselfsup.core.hooks.GradAccumFp16OptimizerHook(update_interval=1, frozen_layers_cfg={}, **kwargs)[source]

Fp16 optimizer hook (using PyTorch’s implementation).

This hook can accumulate gradients every n intervals and freeze some layers for some iters at the beginning. If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, to take care of the optimization procedure.

Parameters
  • update_interval (int, optional) – The update interval of the weights, set > 1 to accumulate the grad. Defaults to 1.

  • frozen_layers_cfg (dict, optional) – Dict to config frozen layers. The key-value pair is layer name and its frozen iters. If frozen, the layer gradient would be set to None. Defaults to dict().

after_train_iter(runner)[source]

Backward optimization steps for Mixed Precision Training. For dynamic loss scaling, please refer to https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler.

  1. Scale the loss by a scale factor.

  2. Backward the loss to obtain the gradients.

  3. Unscale the optimizer’s gradient tensors.

  4. Call optimizer.step() and update scale factor.

  5. Save loss_scaler state_dict for resume purpose.

class mmselfsup.core.hooks.MomentumUpdateHook(end_momentum=1.0, update_interval=1, **kwargs)[source]

Hook for updating momentum parameter, used by BYOL, MoCoV3, etc.

This hook includes momentum adjustment following:

\[m = 1 - (1 - m_0) * (cos(pi * k / K) + 1) / 2\]

where \(k\) is the current step, \(K\) is the total steps.

Parameters
  • end_momentum (float) – The final momentum coefficient for the target network. Defaults to 1.

  • update_interval (int, optional) – The momentum update interval of the weights. Defaults to 1.

class mmselfsup.core.hooks.ODCHook(centroids_update_interval, deal_with_small_clusters_interval, evaluate_interval, reweight, reweight_pow, dist_mode=True)[source]

Hook for ODC.

This hook includes the online clustering process in ODC.

Parameters
  • centroids_update_interval (int) – Frequency of iterations to update centroids.

  • deal_with_small_clusters_interval (int) – Frequency of iterations to deal with small clusters.

  • evaluate_interval (int) – Frequency of iterations to evaluate clusters.

  • reweight (bool) – Whether to perform loss re-weighting.

  • reweight_pow (float) – The power of re-weighting.

  • dist_mode (bool) – Use distributed training or not. Defaults to True.

class mmselfsup.core.hooks.SimSiamHook(fix_pred_lr, lr, adjust_by_epoch=True, **kwargs)[source]

Hook for SimSiam.

This hook is for SimSiam to fix learning rate of predictor.

Parameters
  • fix_pred_lr (bool) – whether to fix the lr of predictor or not.

  • lr (float) – the value of fixed lr.

  • adjust_by_epoch (bool, optional) – whether to set lr by epoch or iter. Defaults to True.

before_train_epoch(runner)[source]

fix lr of predictor.

class mmselfsup.core.hooks.StepFixCosineAnnealingLrUpdaterHook(min_lr: Optional[float] = None, min_lr_ratio: Optional[float] = None, **kwargs)[source]
class mmselfsup.core.hooks.SwAVHook(batch_size, epoch_queue_starts=15, crops_for_assign=[0, 1], feat_dim=128, queue_length=0, interval=1, **kwargs)[source]

Hook for SwAV.

This hook builds the queue in SwAV according to epoch_queue_starts. The queue will be saved in runner.work_dir or loaded at start epoch if the path folder has queues saved before.

Parameters
  • batch_size (int) – the batch size per GPU for computing.

  • epoch_queue_starts (int, optional) – from this epoch, starts to use the queue. Defaults to 15.

  • crops_for_assign (list[int], optional) – list of crops id used for computing assignments. Defaults to [0, 1].

  • feat_dim (int, optional) – feature dimension of output vector. Defaults to 128.

  • queue_length (int, optional) – length of the queue (0 for no queue). Defaults to 0.

  • interval (int, optional) – the interval to save the queue. Defaults to 1.

optimizer

class mmselfsup.core.optimizer.DefaultOptimizerConstructor(optimizer_cfg, paramwise_cfg=None)[source]

Rewrote default constructor for optimizers. By default each parameter share the same optimizer settings, and we provide an argument paramwise_cfg to specify parameter-wise settings. It is a dict and may contain the following fields: :param model: The model with parameters to be optimized. :type model: nn.Module :param optimizer_cfg: The config dict of the optimizer.

Positional fields are
  • type: class name of the optimizer.

Optional fields are
  • any arguments of the corresponding optimizer type, e.g., lr, weight_decay, momentum, etc.

Parameters

paramwise_cfg (dict, optional) – Parameter-wise options. Defaults to None.

Example 1:
>>> model = torch.nn.modules.Conv1d(1, 1, 1)
>>> optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
>>>                      weight_decay=0.0001)
>>> paramwise_cfg = dict('bias': dict(weight_decay=0.,                                  lars_exclude=True))
>>> optim_builder = DefaultOptimizerConstructor(
>>>     optimizer_cfg, paramwise_cfg)
>>> optimizer = optim_builder(model)
class mmselfsup.core.optimizer.LARS(params, lr=<required parameter>, momentum=0, weight_decay=0, dampening=0, eta=0.001, nesterov=False, eps=1e-08)[source]

Implements layer-wise adaptive rate scaling for SGD.

Parameters
  • params (iterable) – Iterable of parameters to optimize or dicts defining parameter groups.

  • lr (float) – Base learning rate.

  • momentum (float, optional) – Momentum factor. Defaults to 0 (‘m’)

  • weight_decay (float, optional) – Weight decay (L2 penalty). Defaults to 0. (‘beta’)

  • dampening (float, optional) – Dampening for momentum. Defaults to 0.

  • eta (float, optional) – LARS coefficient. Defaults to 0.001.

  • nesterov (bool, optional) – Enables Nesterov momentum. Defaults to False.

  • eps (float, optional) – A small number to avoid dviding zero. Defaults to 1e-8.

Based on Algorithm 1 of the following paper by You, Gitman, and Ginsburg. `Large Batch Training of Convolutional Networks:

Example

>>> optimizer = LARS(model.parameters(), lr=0.1, momentum=0.9,
>>>                  weight_decay=1e-4, eta=1e-3)
>>> optimizer.zero_grad()
>>> loss_fn(model(input), target).backward()
>>> optimizer.step()
step(closure=None)[source]

Performs a single optimization step.

Parameters

closure (callable, optional) – A closure that reevaluates the model and returns the loss.

class mmselfsup.core.optimizer.TransformerFinetuneConstructor(optimizer_cfg, paramwise_cfg=None)[source]

Rewrote default constructor for optimizers.

By default each parameter share the same optimizer settings, and we provide an argument paramwise_cfg to specify parameter-wise settings. In addition, we provide two optional parameters, model_type and layer_decay to set the commonly used layer-wise learning rate decay schedule. Currently, we only support layer-wise learning rate schedule for swin and vit.

Parameters
  • optimizer_cfg (dict) –

    The config dict of the optimizer. Positional fields are

    • type: class name of the optimizer.

    Optional fields are
    • any arguments of the corresponding optimizer type, e.g., lr, weight_decay, momentum, model_type, layer_decay, etc.

  • paramwise_cfg (dict, optional) – Parameter-wise options. Defaults to None.

Example 1:
>>> model = torch.nn.modules.Conv1d(1, 1, 1)
>>> optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
>>>                      weight_decay=0.0001, model_type='vit')
>>> paramwise_cfg = dict('bias': dict(weight_decay=0.,                                  lars_exclude=True))
>>> optim_builder = TransformerFinetuneConstructor(
>>>     optimizer_cfg, paramwise_cfg)
>>> optimizer = optim_builder(model)
mmselfsup.core.optimizer.build_optimizer(model, optimizer_cfg)[source]

Build optimizer from configs.

Parameters
  • model (nn.Module) – The model with parameters to be optimized.

  • optimizer_cfg (dict) –

    The config dict of the optimizer. Positional fields are:

    • type: class name of the optimizer.

    • lr: base learning rate.

    Optional fields are:
    • any arguments of the corresponding optimizer type, e.g., weight_decay, momentum, etc.

    • paramwise_options: a dict with regular expression as keys to match parameter names and a dict containing options as values. Options include 6 fields: lr, lr_mult, momentum, momentum_mult, weight_decay, weight_decay_mult.

Returns

The initialized optimizer.

Return type

torch.optim.Optimizer

Example

>>> model = torch.nn.modules.Conv1d(1, 1, 1)
>>> paramwise_options = {
>>>     '(bn|gn)(\d+)?.(weight|bias)': dict(weight_decay_mult=0.1),
>>>     '\Ahead.': dict(lr_mult=10, momentum=0)}
>>> optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
>>>                      weight_decay=0.0001,
>>>                      paramwise_options=paramwise_options)
>>> optimizer = build_optimizer(model, optimizer_cfg)

mmselfsup.datasets

data_sources

class mmselfsup.datasets.data_sources.BaseDataSource(data_prefix, classes=None, ann_file=None, test_mode=False, color_type='color', channel_order='rgb', file_client_args={'backend': 'disk'})[source]

Datasource base class to load dataset information.

Parameters
  • data_prefix (str) – the prefix of data path.

  • classes (str | Sequence[str], optional) – Specify classes to load.

  • ann_file (str | None) – the annotation file. When ann_file is str, the subclass is expected to read from the ann_file. When ann_file is None, the subclass is expected to read according to data_prefix.

  • test_mode (bool) – in train mode or test mode. Defaults to False.

  • color_type (str) – The flag argument for mmcv.imfrombytes(). Defaults to color.

  • channel_order (str) – The channel order of images when loaded. Defaults to rgb.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend=’disk’).

get_cat_ids(idx)[source]

Get category id by index.

Parameters

idx (int) – Index of data.

Returns

Image category of specified index.

Return type

int

classmethod get_classes(classes=None)[source]

Get class names of current dataset.

Parameters

classes (Sequence[str] | str | None) – If classes is None, use default CLASSES defined by builtin dataset. If classes is a string, take it as a file name. The file contains the name of classes where each line contains one class name. If classes is a tuple or list, override the CLASSES defined by the dataset.

Returns

Names of categories of the dataset.

Return type

tuple[str] or list[str]

get_gt_labels()[source]

Get all ground-truth labels (categories).

Returns

categories for all images.

Return type

list[int]

get_img(idx)[source]

Get image by index.

Parameters

idx (int) – Index of data.

Returns

PIL Image format.

Return type

Image

class mmselfsup.datasets.data_sources.CIFAR10(data_prefix, classes=None, ann_file=None, test_mode=False, color_type='color', channel_order='rgb', file_client_args={'backend': 'disk'})[source]

CIFAR10 Dataset.

This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py

class mmselfsup.datasets.data_sources.CIFAR100(data_prefix, classes=None, ann_file=None, test_mode=False, color_type='color', channel_order='rgb', file_client_args={'backend': 'disk'})[source]

CIFAR100 Dataset.

class mmselfsup.datasets.data_sources.ImageList(data_prefix, classes=None, ann_file=None, test_mode=False, color_type='color', channel_order='rgb', file_client_args={'backend': 'disk'})[source]

The implementation for loading any image list file.

The ImageList can load an annotation file or a list of files and merge all data records to one list. If data is unlabeled, the gt_label will be set -1.

class mmselfsup.datasets.data_sources.ImageNet(data_prefix, classes=None, ann_file=None, test_mode=False, color_type='color', channel_order='rgb', file_client_args={'backend': 'disk'})[source]

ImageNet Dataset.

This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/imagenet.py

class mmselfsup.datasets.data_sources.ImageNet21k(data_prefix, classes=None, ann_file=None, multi_label=False, recursion_subdir=False, test_mode=False)[source]

ImageNet21k Dataset. Since the dataset ImageNet21k is extremely big, cantains 21k+ classes and 1.4B files. This class has improved the following points on the basis of the class ImageNet, in order to save memory usage and time required :

  • Delete the samples attribute

  • using ‘slots’ create a Data_item tp replace dict

  • Modify setting info dict from function load_annotations to function prepare_data

  • using int instead of np.array(…, np.int64)

Parameters
  • data_prefix (str) – the prefix of data path

  • ann_file (str | None) – the annotation file. When ann_file is str, the subclass is expected to read from the ann_file. When ann_file is None, the subclass is expected to read according to data_prefix

  • test_mode (bool) – in train mode or test mode

  • multi_label (bool) – use multi label or not.

  • recursion_subdir (bool) – whether to use sub-directory pictures, which are meet the conditions in the folder under category directory.

load_annotations()[source]

load dataset annotations.

pipelines

class mmselfsup.datasets.pipelines.BEiTMaskGenerator(input_size: int, num_masking_patches: int, min_num_patches: int = 4, max_num_patches: Optional[int] = None, min_aspect: float = 0.3, max_aspect: Optional[float] = None)[source]

Generate mask for image.

This module is borrowed from https://github.com/microsoft/unilm/tree/master/beit

Parameters
  • input_size (int) – The size of input image.

  • num_masking_patches (int) – The number of patches to be masked.

  • min_num_patches (int) – The minimum number of patches to be masked in the process of generating mask. Defaults to 4.

  • max_num_patches (int, optional) – The maximum number of patches to be masked in the process of generating mask. Defaults to None.

  • min_aspect (float, optional) – The minimum aspect ratio of mask blocks. Defaults to 0.3.

  • min_aspect – The minimum aspect ratio of mask blocks. Defaults to None.

class mmselfsup.datasets.pipelines.GaussianBlur(sigma_min, sigma_max, p=0.5)[source]

GaussianBlur augmentation refers to `SimCLR.

<https://arxiv.org/abs/2002.05709>`_.

Parameters
  • sigma_min (float) – The minimum parameter of Gaussian kernel std.

  • sigma_max (float) – The maximum parameter of Gaussian kernel std.

  • p (float, optional) – Probability. Defaults to 0.5.

class mmselfsup.datasets.pipelines.Lighting(alphastd=0.1)[source]

Lighting noise(AlexNet - style PCA - based noise).

Parameters

alphastd (float, optional) – The parameter for Lighting. Defaults to 0.1.

class mmselfsup.datasets.pipelines.MaskFeatMaskGenerator(mask_window_size: int = 14, mask_ratio: float = 0.4, min_num_patches: int = 15, max_num_patches: Optional[int] = None, min_aspect: float = 0.3, max_aspect: Optional[float] = None)[source]

Generate random block mask for each image.

This module is borrowed from https://github.com/facebookresearch/SlowFast/blob/main/slowfast/datasets/transform.py :param mask_window_size: Size of input image. Defaults to 14. :type mask_window_size: int :param mask_ratio: The mask ratio of image. Defaults to 0.4. :type mask_ratio: float :param min_num_patches: Minimum number of patches that require masking.

Defaults to 15.

Parameters
  • max_num_patches (int, optional) – Maximum number of patches that require masking. Defaults to None.

  • min_aspect (int) – Minimum aspect of patches. Defaults to 0.3.

  • max_aspect (float, optional) – Maximum aspect of patches. Defaults to None.

class mmselfsup.datasets.pipelines.RandomAppliedTrans(transforms, p=0.5)[source]

Randomly applied transformations.

Parameters
  • transforms (list[dict]) – List of transformations in dictionaries.

  • p (float, optional) – Probability. Defaults to 0.5.

class mmselfsup.datasets.pipelines.RandomAug(input_size=None, color_jitter=None, auto_augment=None, interpolation=None, re_prob=None, re_mode=None, re_count=None, mean=None, std=None)[source]

RandAugment data augmentation method based on “RandAugment: Practical automated data augmentation with a reduced search space”.

This code is borrowed from <https://github.com/pengzhiliang/MAE-pytorch>

class mmselfsup.datasets.pipelines.SimMIMMaskGenerator(input_size: int = 192, mask_patch_size: int = 32, model_patch_size: int = 4, mask_ratio: float = 0.6)[source]

Generate random block mask for each Image.

This module is used in SimMIM to generate masks.

Parameters
  • input_size (int) – Size of input image. Defaults to 192.

  • mask_patch_size (int) – Size of each block mask. Defaults to 32.

  • model_patch_size (int) – Patch size of each token. Defaults to 4.

  • mask_ratio (float) – The mask ratio of image. Defaults to 0.6.

class mmselfsup.datasets.pipelines.Solarization(threshold=128, p=0.5)[source]

Solarization augmentation refers to `BYOL.

<https://arxiv.org/abs/2006.07733>`_.

Parameters
  • threshold (float, optional) – The solarization threshold. Defaults to 128.

  • p (float, optional) – Probability. Defaults to 0.5.

class mmselfsup.datasets.pipelines.ToTensor[source]

Convert image or a sequence of images to tensor.

This module can not only convert a single image to tensor, but also a sequence of images.

samplers

class mmselfsup.datasets.samplers.DistributedGivenIterationSampler(dataset, total_iter, batch_size, num_replicas=None, rank=None, last_iter=- 1)[source]
gen_new_list()[source]

Each process shuffle all list with same seed, and pick one piece according to rank.

class mmselfsup.datasets.samplers.DistributedGroupSampler(dataset, samples_per_gpu=1, num_replicas=None, rank=None)[source]

Sampler that restricts data loading to a subset of the dataset.

It is especially useful in conjunction with torch.nn.parallel.DistributedDataParallel. In such case, each process can pass a DistributedSampler instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it.

Note

Dataset is assumed to be of constant size.

Parameters
  • dataset – Dataset used for sampling.

  • num_replicas (optional) – Number of processes participating in distributed training.

  • rank (optional) – Rank of the current process within num_replicas.

class mmselfsup.datasets.samplers.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, replace=False, seed=0)[source]
class mmselfsup.datasets.samplers.GroupSampler(dataset, samples_per_gpu=1)[source]

datasets

class mmselfsup.datasets.BaseDataset(data_source, pipeline, prefetch=False)[source]

Base dataset class.

The base dataset can be inherited by different algorithm’s datasets. After __init__, the data source and pipeline will be built. Besides, the algorithm specific dataset implements different operations after obtaining images from data sources.

Parameters
  • data_source (dict) – Data source defined in mmselfsup.datasets.data_sources.

  • pipeline (list[dict]) – A list of dict, where each element represents an operation defined in mmselfsup.datasets.pipelines.

  • prefetch (bool, optional) – Whether to prefetch data. Defaults to False.

class mmselfsup.datasets.ConcatDataset(datasets)[source]

A wrapper of concatenated dataset.

Same as torch.utils.data.dataset.ConcatDataset, but concat the group flag for image aspect ratio.

Parameters

datasets (list[Dataset]) – A list of datasets.

class mmselfsup.datasets.DeepClusterDataset(data_source, pipeline, prefetch=False)[source]

Dataset for DC and ODC.

The dataset initializes clustering labels and assigns it during training.

Parameters
  • data_source (dict) – Data source defined in mmselfsup.datasets.data_sources.

  • pipeline (list[dict]) – A list of dict, where each element represents an operation defined in mmselfsup.datasets.pipelines.

  • prefetch (bool, optional) – Whether to prefetch data. Defaults to False.

class mmselfsup.datasets.MultiViewDataset(data_source, num_views, pipelines, prefetch=False)[source]

The dataset outputs multiple views of an image.

The number of views in the output dict depends on num_views. The image can be processed by one pipeline or multiple piepelines.

Parameters
  • data_source (dict) – Data source defined in mmselfsup.datasets.data_sources.

  • num_views (list) – The number of different views.

  • pipelines (list[list[dict]]) – A list of pipelines, where each pipeline contains elements that represents an operation defined in mmselfsup.datasets.pipelines.

  • prefetch (bool, optional) – Whether to prefetch data. Defaults to False.

Examples

>>> dataset = MultiViewDataset(data_source, [2], [pipeline])
>>> output = dataset[idx]
The output got 2 views processed by one pipeline.
>>> dataset = MultiViewDataset(
>>>     data_source, [2, 6], [pipeline1, pipeline2])
>>> output = dataset[idx]
The output got 8 views processed by two pipelines, the first two views
were processed by pipeline1 and the remaining views by pipeline2.
class mmselfsup.datasets.RelativeLocDataset(data_source, pipeline, format_pipeline, prefetch=False)[source]

Dataset for relative patch location.

The dataset crops image into several patches and concatenates every surrounding patch with center one. Finally it also outputs corresponding labels 0, 1, 2, 3, 4, 5, 6, 7.

Parameters
  • data_source (dict) – Data source defined in mmselfsup.datasets.data_sources.

  • pipeline (list[dict]) – A list of dict, where each element represents an operation defined in mmselfsup.datasets.pipelines.

  • format_pipeline (list[dict]) – A list of dict, it converts input format from PIL.Image to Tensor. The operation is defined in mmselfsup.datasets.pipelines.

  • prefetch (bool, optional) – Whether to prefetch data. Defaults to False.

class mmselfsup.datasets.RepeatDataset(dataset, times)[source]

A wrapper of repeated dataset.

The length of repeated dataset will be times larger than the original dataset. This is useful when the data loading time is long but the dataset is small. Using RepeatDataset can reduce the data loading time between epochs.

Parameters
  • dataset (Dataset) – The dataset to be repeated.

  • times (int) – Repeat times.

class mmselfsup.datasets.RotationPredDataset(data_source, pipeline, prefetch=False)[source]

Dataset for rotation prediction.

The dataset rotates the image with 0, 90, 180, and 270 degrees and outputs labels 0, 1, 2, 3 correspodingly.

Parameters
  • data_source (dict) – Data source defined in mmselfsup.datasets.data_sources.

  • pipeline (list[dict]) – A list of dict, where each element represents an operation defined in mmselfsup.datasets.pipelines.

  • prefetch (bool, optional) – Whether to prefetch data. Defaults to False.

class mmselfsup.datasets.SingleViewDataset(data_source, pipeline, prefetch=False)[source]

The dataset outputs one view of an image, containing some other information such as label, idx, etc.

Parameters
  • data_source (dict) – Data source defined in mmselfsup.datasets.data_sources.

  • pipeline (list[dict]) – A list of dict, where each element represents an operation defined in mmselfsup.datasets.pipelines.

  • prefetch (bool, optional) – Whether to prefetch data. Defaults to False.

evaluate(results, logger=None, topk=(1, 5))[source]

The evaluation function to output accuracy.

Parameters
  • results (dict) – The key-value pair is the output head name and corresponding prediction values.

  • logger (logging.Logger | str | None, optional) – The defined logger to be used. Defaults to None.

  • topk (tuple(int)) – The output includes topk accuracy.

mmselfsup.datasets.build_dataloader(dataset, imgs_per_gpu=None, samples_per_gpu=None, workers_per_gpu=1, num_gpus=1, dist=True, shuffle=True, replace=False, seed=None, pin_memory=True, persistent_workers=True, **kwargs)[source]

Build PyTorch DataLoader.

In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.

Parameters
  • dataset (Dataset) – A PyTorch dataset.

  • imgs_per_gpu (int) – (Deprecated, please use samples_per_gpu) Number of images on each GPU, i.e., batch size of each GPU. Defaults to None.

  • samples_per_gpu (int) – Number of images on each GPU, i.e., batch size of each GPU. Defaults to None.

  • workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU. persistent_workers option needs num_workers > 0. Defaults to 1.

  • num_gpus (int) – Number of GPUs. Only used in non-distributed training.

  • dist (bool) – Distributed training/test or not. Defaults to True.

  • shuffle (bool) – Whether to shuffle the data at every epoch. Defaults to True.

  • replace (bool) – Replace or not in random shuffle. It works on when shuffle is True. Defaults to False.

  • seed (int) – set seed for dataloader.

  • pin_memory (bool, optional) – If True, the data loader will copy Tensors into CUDA pinned memory before returning them. Defaults to True.

  • persistent_workers (bool) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. The argument also has effect in PyTorch>=1.7.0. Defaults to True.

  • kwargs – any keyword argument to be used to initialize DataLoader

Returns

A PyTorch dataloader.

Return type

DataLoader

mmselfsup.models

algorithms

class mmselfsup.models.algorithms.BYOL(backbone, neck=None, head=None, base_momentum=0.996, init_cfg=None, **kwargs)[source]

BYOL.

Implementation of Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning. The momentum adjustment is in core/hooks/byol_hook.py.

Parameters
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.

  • head (dict) – Config dict for module of loss functions. Defaults to None.

  • base_momentum (float) – The base momentum coefficient for the target network. Defaults to 0.996.

extract_feat(img)[source]

Function to extract features from backbone.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

backbone outputs.

Return type

tuple[Tensor]

forward_train(img, **kwargs)[source]

Forward computation during training.

Parameters

img (list[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

momentum_update()[source]

Momentum update of the target network.

class mmselfsup.models.algorithms.BarlowTwins(backbone: Optional[dict] = None, neck: Optional[dict] = None, head: Optional[dict] = None, init_cfg: Optional[dict] = None, **kwargs)[source]

BarlowTwins.

Implementation of Barlow Twins: Self-Supervised Learning via Redundancy Reduction. Part of the code is borrowed from: https://github.com/facebookresearch/barlowtwins/blob/main/main.py.

Parameters
  • backbone (dict) – Config dict for module of backbone. Defaults to None.

  • neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.

  • head (dict) – Config dict for module of loss functions. Defaults to None.

  • init_cfg (dict) – Config dict for weight initialization. Defaults to None.

extract_feat(img: torch.Tensor)torch.Tensor[source]

Function to extract features from backbone.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

backbone outputs.

Return type

tuple[Tensor]

forward_train(img: List[torch.Tensor])dict[source]

Forward computation during training.

Parameters

img (List[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

A dictionary of loss components

Return type

dict[str, Tensor]

class mmselfsup.models.algorithms.BaseModel(init_cfg=None)[source]

Base model class for self-supervised learning.

abstract extract_feat(imgs)[source]

Function to extract features from backbone.

Parameters
  • img (Tensor) – Input images. Typically these should be mean centered

  • std scaled. (and) –

forward(img, mode='train', **kwargs)[source]

Forward function of model.

Calls either forward_train, forward_test or extract_feat function according to the mode.

forward_test(imgs, **kwargs)[source]
Parameters
  • img (Tensor) – List of tensors. Typically these should be mean centered and std scaled.

  • kwargs (keyword arguments) – Specific to concrete implementation.

abstract forward_train(imgs, **kwargs)[source]
Parameters
  • img ([Tensor) – List of tensors. Typically these should be mean centered and std scaled.

  • kwargs (keyword arguments) – Specific to concrete implementation.

train_step(data, optimizer)[source]

The iteration step during training.

This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating are also defined in this method, such as GAN.

Parameters
  • data (dict) – The output of dataloader.

  • optimizer (torch.optim.Optimizer | dict) – The optimizer of runner is passed to train_step(). This argument is unused and reserved.

Returns

Dict of outputs. The following fields are contained.
  • loss (torch.Tensor): A tensor for back propagation, which can be a weighted sum of multiple losses.

  • log_vars (dict): Dict contains all the variables to be sent to the logger.

  • num_samples (int): Indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.

Return type

dict

val_step(data, optimizer)[source]

The iteration step during validation.

This method shares the same signature as train_step(), but used during val epochs. Note that the evaluation after training epochs is not implemented with this method, but an evaluation hook.

class mmselfsup.models.algorithms.CAE(backbone: Optional[dict] = None, neck: Optional[dict] = None, head: Optional[dict] = None, base_momentum: float = 0.0, init_cfg: Optional[dict] = None, **kwargs)[source]

CAE.

Implementation of Context Autoencoder for Self-Supervised Representation Learning.

Parameters
  • backbone (dict, optional) – Config dict for module of backbone.

  • neck (dict, optional) – Config dict for module of deep features to compact feature vectors. Defaults to None.

  • head (dict, optional) – Config dict for module of loss functions. Defaults to None.

  • base_momentum (float) – The base momentum coefficient for the target network. Defaults to 0.0.

  • init_cfg (dict, optional) – the config to control the initialization.

extract_feat(img: torch.Tensor, mask: torch.Tensor)torch.Tensor[source]

Function to extract features from backbone.

Parameters
  • img (Tensor) – Input images. Typically these should be mean centered

  • std scaled. (and) –

forward_train(samples: Sequence, **kwargs)dict[source]

Args: img ([Tensor): List of tensors. Typically these should be

mean centered and std scaled.

kwargs (keyword arguments): Specific to concrete implementation.

init_weights()None[source]

Initialize the weights.

momentum_update()None[source]

Momentum update of the teacher network.

class mmselfsup.models.algorithms.Classification(backbone, with_sobel=False, head=None, train_cfg=None, init_cfg=None)[source]

Simple image classification.

Parameters
  • backbone (dict) – Config dict for module of backbone.

  • with_sobel (bool) – Whether to apply a Sobel filter. Defaults to False.

  • head (dict) – Config dict for module of loss functions. Defaults to None.

extract_feat(img)[source]

Function to extract features from backbone.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

backbone outputs.

Return type

tuple[Tensor]

forward_test(img, **kwargs)[source]

Forward computation during test.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

A dictionary of output features.

Return type

dict[str, Tensor]

forward_train(img, label, **kwargs)[source]

Forward computation during training.

Parameters
  • img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

  • label (Tensor) – Ground-truth labels.

  • kwargs – Any keyword arguments to be used to forward.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

class mmselfsup.models.algorithms.DeepCluster(backbone, with_sobel=True, neck=None, head=None, init_cfg=None)[source]

DeepCluster.

Implementation of Deep Clustering for Unsupervised Learning of Visual Features. The clustering operation is in core/hooks/deepcluster_hook.py.

Parameters
  • backbone (dict) – Config dict for module of backbone.

  • with_sobel (bool) – Whether to apply a Sobel filter on images. Defaults to True.

  • neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.

  • head (dict) – Config dict for module of loss functions. Defaults to None.

extract_feat(img)[source]

Function to extract features from backbone.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

backbone outputs.

Return type

tuple[Tensor]

forward_test(img, **kwargs)[source]

Forward computation during test.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

A dictionary of output features.

Return type

dict[str, Tensor]

forward_train(img, pseudo_label, **kwargs)[source]

Forward computation during training.

Parameters
  • img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

  • pseudo_label (Tensor) – Label assignments.

  • kwargs – Any keyword arguments to be used to forward.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

set_reweight(labels, reweight_pow=0.5)[source]

Loss re-weighting.

Re-weighting the loss according to the number of samples in each class.

Parameters
  • labels (numpy.ndarray) – Label assignments.

  • reweight_pow (float) – The power of re-weighting. Defaults to 0.5.

class mmselfsup.models.algorithms.DenseCL(backbone, neck=None, head=None, queue_len=65536, feat_dim=128, momentum=0.999, loss_lambda=0.5, init_cfg=None, **kwargs)[source]

DenseCL.

Implementation of Dense Contrastive Learning for Self-Supervised Visual Pre-Training. Borrowed from the authors’ code: https://github.com/WXinlong/DenseCL. The loss_lambda warmup is in core/hooks/densecl_hook.py.

Parameters
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.

  • head (dict) – Config dict for module of loss functions. Defaults to None.

  • queue_len (int) – Number of negative keys maintained in the queue. Defaults to 65536.

  • feat_dim (int) – Dimension of compact feature vectors. Defaults to 128.

  • momentum (float) – Momentum coefficient for the momentum-updated encoder. Defaults to 0.999.

  • loss_lambda (float) – Loss weight for the single and dense contrastive loss. Defaults to 0.5.

extract_feat(img)[source]

Function to extract features from backbone.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

backbone outputs.

Return type

tuple[Tensor]

forward_test(img, **kwargs)[source]

Forward computation during test.

Parameters

img (Tensor) – Input of two concatenated images of shape (N, 2, C, H, W). Typically these should be mean centered and std scaled.

Returns

A dictionary of normalized output features.

Return type

dict(Tensor)

forward_train(img, **kwargs)[source]

Forward computation during training.

Parameters

img (list[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

init_weights()[source]

Init weights and copy query encoder init weights to key encoder.

class mmselfsup.models.algorithms.MAE(backbone: dict, neck: dict, head: dict, init_cfg: Optional[dict] = None)[source]

MAE.

Implementation of Masked Autoencoders Are Scalable Vision Learners.

Parameters
  • backbone (dict) – Config dict for encoder. Defaults to None.

  • neck (dict) – Config dict for encoder. Defaults to None.

  • head (dict) – Config dict for loss functions. Defaults to None.

  • init_cfg (dict, optional) – Config dict for weight initialization. Defaults to None.

extract_feat(img: torch.Tensor)Tuple[torch.Tensor][source]

Function to extract features from backbone.

Parameters

img (torch.Tensor) – Input images of shape (N, C, H, W).

Returns

backbone outputs.

Return type

Tuple[torch.Tensor]

forward_test(img: torch.Tensor, **kwargs)Tuple[torch.Tensor, torch.Tensor][source]

Forward computation during testing.

Parameters
  • img (torch.Tensor) – Input images of shape (N, C, H, W).

  • kwargs – Any keyword arguments to be used to forward.

Returns

Output of model test.
  • mask: Mask used to mask image.

  • pred: The output of neck.

Return type

Tuple[torch.Tensor, torch.Tensor]

forward_train(img: torch.Tensor, **kwargs)Dict[str, torch.Tensor][source]

Forward computation during training.

Parameters
  • img (torch.Tensor) – Input images of shape (N, C, H, W).

  • kwargs – Any keyword arguments to be used to forward.

Returns

A dictionary of loss components.

Return type

Dict[str, torch.Tensor]

init_weights()[source]

Initialize the weights.

class mmselfsup.models.algorithms.MMClsImageClassifierWrapper(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, pretrained: Optional[str] = None, train_cfg: Optional[dict] = None, init_cfg: Optional[dict] = None)[source]

Workaround to use models from mmclassificaiton.

Since the output of classifier from mmclassification is not compatible with mmselfsup’s evaluation function. We rewrite some key components from mmclassification.

Parameters
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict, optional) – Config dict for module of neck. Defaults to None.

  • head (dict, optional) – Config dict for module of loss functions. Defaults to None.

  • pretrained (str, optional) – The path of pre-trained checkpoint. Defaults to None.

  • train_cfg (dict, optional) – Config dict for pre-processing utils, e.g. mixup. Defaults to None.

  • init_cfg (dict, optional) – Config dict for initialization. Defaults to None.

forward(img, mode='train', **kwargs)[source]

Forward function of model.

Calls either forward_train, forward_test or extract_feat function according to the mode.

forward_test(imgs, **kwargs)[source]
Parameters

imgs (List[Tensor]) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch.

forward_train(img, label, **kwargs)[source]

Forward computation during training.

Parameters
  • img (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.

  • label (Tensor) – It should be of shape (N, 1) encoding the ground-truth label of input images for single label task. It shoulf be of shape (N, C) encoding the ground-truth label of input images for multi-labels task.

Returns

a dictionary of loss components

Return type

dict[str, Tensor]

class mmselfsup.models.algorithms.MaskFeat(backbone: dict, head: dict, hog_para: dict, init_cfg: Optional[dict] = None)[source]

MaskFeat.

Implementation of Masked Feature Prediction for Self-Supervised Visual Pre-Training. :param backbone: Config dict for encoder. :type backbone: dict :param head: Config dict for loss functions. :type head: dict :param hog_para: Config dict for hog layer.

dict[‘nbins’, int]: Number of bin. Defaults to 9. dict[‘pool’, float]: Number of cell. Defaults to 8. dict[‘gaussian_window’, int]: Size of gaussian kernel.

Defaults to 16.

Parameters

init_cfg (dict) – Config dict for weight initialization. Defaults to None.

extract_feat(input: List[torch.Tensor])torch.Tensor[source]

Function to extract features from backbone.

Parameters

input (List[torch.Tensor, torch.Tensor]) – Input images and masks.

Returns

backbone outputs.

Return type

tuple[Tensor]

forward_train(input: List[torch.Tensor], **kwargs)dict[source]

Forward computation during training.

Parameters
  • input (List[torch.Tensor, torch.Tensor]) – Input images and masks.

  • kwargs – Any keyword arguments to be used to forward.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

class mmselfsup.models.algorithms.MoCo(backbone, neck=None, head=None, queue_len=65536, feat_dim=128, momentum=0.999, init_cfg=None, **kwargs)[source]

MoCo.

Implementation of Momentum Contrast for Unsupervised Visual Representation Learning. Part of the code is borrowed from: https://github.com/facebookresearch/moco/blob/master/moco/builder.py.

Parameters
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.

  • head (dict) – Config dict for module of loss functions. Defaults to None.

  • queue_len (int) – Number of negative keys maintained in the queue. Defaults to 65536.

  • feat_dim (int) – Dimension of compact feature vectors. Defaults to 128.

  • momentum (float) – Momentum coefficient for the momentum-updated encoder. Defaults to 0.999.

extract_feat(img)[source]

Function to extract features from backbone.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

backbone outputs.

Return type

tuple[Tensor]

forward_train(img, **kwargs)[source]

Forward computation during training.

Parameters

img (list[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

class mmselfsup.models.algorithms.MoCoV3(backbone, neck, head, base_momentum=0.99, init_cfg=None, **kwargs)[source]

MoCo v3.

Implementation of An Empirical Study of Training Self-Supervised Vision Transformers.

Parameters
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.

  • head (dict) – Config dict for module of loss functions. Defaults to None.

  • base_momentum (float) – Momentum coefficient for the momentum-updated encoder. Defaults to 0.99.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None

extract_feat(img)[source]

Function to extract features from backbone.

Parameters

img (Tensor) – Input images. Typically these should be mean centered and std scaled.

Returns

backbone outputs.

Return type

tuple[Tensor]

forward_train(img, **kwargs)[source]

Forward computation during training.

Parameters

img (list[Tensor]) – A list of input images. Typically these should be mean centered and std scaled.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

init_weights()[source]

Initialize base_encoder with init_cfg defined in backbone.

momentum_update()[source]

Momentum update of the momentum encoder.

class mmselfsup.models.algorithms.NPID(backbone, neck=None, head=None, memory_bank=None, neg_num=65536, ensure_neg=False, init_cfg=None)[source]

NPID.

Implementation of Unsupervised Feature Learning via Non-parametric Instance Discrimination.

Parameters
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.

  • head (dict) – Config dict for module of loss functions. Defaults to None.

  • memory_bank (dict) – Config dict for module of memory banks. Defaults to None.

  • neg_num (int) – Number of negative samples for each image. Defaults to 65536.

  • ensure_neg (bool) – If False, there is a small probability that negative samples contain positive ones. Defaults to False.

extract_feat(img)[source]

Function to extract features from backbone.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

backbone outputs.

Return type

tuple[Tensor]

forward_train(img, idx, **kwargs)[source]

Forward computation during training.

Parameters
  • img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

  • idx (Tensor) – Index corresponding to each image.

  • kwargs – Any keyword arguments to be used to forward.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

class mmselfsup.models.algorithms.ODC(backbone, with_sobel=False, neck=None, head=None, memory_bank=None, init_cfg=None)[source]

ODC.

Official implementation of Online Deep Clustering for Unsupervised Representation Learning. The operation w.r.t. memory bank and loss re-weighting is in

core/hooks/odc_hook.py.

Parameters
  • backbone (dict) – Config dict for module of backbone.

  • with_sobel (bool) – Whether to apply a Sobel filter on images. Defaults to False.

  • neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.

  • head (dict) – Config dict for module of loss functions. Defaults to None.

  • memory_bank (dict) – Module of memory banks. Defaults to None.

extract_feat(img)[source]

Function to extract features from backbone.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

backbone outputs.

Return type

tuple[Tensor]

forward_test(img, **kwargs)[source]

Forward computation during test.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

A dictionary of output features.

Return type

dict[str, Tensor]

forward_train(img, idx, **kwargs)[source]

Forward computation during training.

Parameters
  • img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

  • idx (Tensor) – Index corresponding to each image.

  • kwargs – Any keyword arguments to be used to forward.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

set_reweight(labels=None, reweight_pow=0.5)[source]

Loss re-weighting.

Re-weighting the loss according to the number of samples in each class.

Parameters
  • labels (numpy.ndarray) – Label assignments. Defaults to None.

  • reweight_pow (float) – The power of re-weighting. Defaults to 0.5.

class mmselfsup.models.algorithms.RelativeLoc(backbone, neck=None, head=None, init_cfg=None)[source]

Relative patch location.

Implementation of Unsupervised Visual Representation Learning by Context Prediction.

Parameters
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.

  • head (dict) – Config dict for module of loss functions. Defaults to None.

extract_feat(img)[source]

Function to extract features from backbone.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

backbone outputs.

Return type

tuple[Tensor]

forward(img, patch_label=None, mode='train', **kwargs)[source]

Forward function to select mode and modify the input image shape.

Parameters

img (Tensor) – Input images, the shape depends on mode. Typically these should be mean centered and std scaled.

forward_test(img, **kwargs)[source]

Forward computation during training.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

A dictionary of output features.

Return type

dict[str, Tensor]

forward_train(img, patch_label, **kwargs)[source]

Forward computation during training.

Parameters
  • img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

  • patch_label (Tensor) – Labels for the relative patch locations.

  • kwargs – Any keyword arguments to be used to forward.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

class mmselfsup.models.algorithms.RotationPred(backbone, head=None, init_cfg=None)[source]

Rotation prediction.

Implementation of Unsupervised Representation Learning by Predicting Image Rotations.

Parameters
  • backbone (dict) – Config dict for module of backbone.

  • head (dict) – Config dict for module of loss functions. Defaults to None.

extract_feat(img)[source]

Function to extract features from backbone.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

backbone outputs.

Return type

tuple[Tensor]

forward(img, rot_label=None, mode='train', **kwargs)[source]

Forward function to select mode and modify the input image shape.

Parameters

img (Tensor) – Input images, the shape depends on mode. Typically these should be mean centered and std scaled.

forward_test(img, **kwargs)[source]

Forward computation during training.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

A dictionary of output features.

Return type

dict[str, Tensor]

forward_train(img, rot_label, **kwargs)[source]

Forward computation during training.

Parameters
  • img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

  • rot_label (Tensor) – Labels for the rotations.

  • kwargs – Any keyword arguments to be used to forward.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

class mmselfsup.models.algorithms.SimCLR(backbone, neck=None, head=None, init_cfg=None)[source]

SimCLR.

Implementation of A Simple Framework for Contrastive Learning of Visual Representations.

Parameters
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.

  • head (dict) – Config dict for module of loss functions. Defaults to None.

extract_feat(img)[source]

Function to extract features from backbone.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

backbone outputs.

Return type

tuple[Tensor]

forward_train(img, **kwargs)[source]

Forward computation during training.

Parameters

img (list[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

class mmselfsup.models.algorithms.SimMIM(backbone: dict, neck: dict, head: dict, init_cfg: Optional[dict] = None)[source]

SimMIM.

Implementation of SimMIM: A Simple Framework for Masked Image Modeling.

Parameters
  • backbone (dict) – Config dict for encoder. Defaults to None.

  • neck (dict) – Config dict for encoder. Defaults to None.

  • head (dict) – Config dict for loss functions. Defaults to None.

  • init_cfg (dict, optional) – Config dict for weight initialization. Defaults to None.

extract_feat(img: torch.Tensor)tuple[source]

Function to extract features from backbone.

Parameters

img (torch.Tensor) – Input images of shape (N, C, H, W).

Returns

Latent representations of images.

Return type

tuple[Tensor]

forward_train(x: List[torch.Tensor], **kwargs)dict[source]

Forward the masked image and get the reconstruction loss.

Parameters

x (List[torch.Tensor, torch.Tensor]) – Images and masks.

Returns

Reconstructed loss.

Return type

dict

class mmselfsup.models.algorithms.SimSiam(backbone, neck=None, head=None, init_cfg=None, **kwargs)[source]

SimSiam.

Implementation of Exploring Simple Siamese Representation Learning. The operation of fixing learning rate of predictor is in core/hooks/simsiam_hook.py.

Parameters
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.

  • head (dict) – Config dict for module of loss functions. Defaults to None.

extract_feat(img)[source]

Function to extract features from backbone.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

backbone outputs.

Return type

tuple[Tensor]

forward_train(img)[source]

Forward computation during training.

Parameters

img (list[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

A dictionary of loss components

Return type

loss[str, Tensor]

class mmselfsup.models.algorithms.SwAV(backbone, neck=None, head=None, init_cfg=None, **kwargs)[source]

SwAV.

Implementation of Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. The queue is built in core/hooks/swav_hook.py.

Parameters
  • backbone (dict) – Config dict for module of backbone.

  • neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.

  • head (dict) – Config dict for module of loss functions. Defaults to None.

extract_feat(img)[source]

Function to extract features from backbone.

Parameters

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

Backbone outputs.

Return type

tuple[Tensor]

forward_train(img, **kwargs)[source]

Forward computation during training.

Parameters

img (list[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

backbones

class mmselfsup.models.backbones.CAEViT(arch: str = 'b', img_size: int = 224, patch_size: int = 16, out_indices: int = - 1, drop_rate: float = 0, drop_path_rate: float = 0, qkv_bias: bool = True, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, final_norm: bool = True, output_cls_token: bool = True, interpolate_mode: str = 'bicubic', init_values: Optional[float] = None, patch_cfg: dict = {}, layer_cfgs: dict = {}, init_cfg: Optional[dict] = None)[source]

Vision Transformer for CAE pre-training.

Rewritten version of: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Parameters
  • arch (str | dict) – Vision Transformer architecture. Default: ‘b’

  • img_size (int | tuple) – Input image size

  • patch_size (int | tuple) – The patch size

  • out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.

  • drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.

  • output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.

  • interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.

  • init_values (float, optional) – The init value of gamma in TransformerEncoderLayer.

  • patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.

  • layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(img: torch.Tensor, mask: torch.Tensor)torch.Tensor[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

init_weights()None[source]

Initialize the weights.

class mmselfsup.models.backbones.MAEViT(arch='b', img_size=224, patch_size=16, out_indices=- 1, drop_rate=0, drop_path_rate=0, norm_cfg={'eps': 1e-06, 'type': 'LN'}, final_norm=True, output_cls_token=True, interpolate_mode='bicubic', patch_cfg={}, layer_cfgs={}, mask_ratio=0.75, init_cfg=None)[source]

Vision Transformer for MAE pre-training.

A PyTorch implement of: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Parameters
  • arch (str | dict) – Vision Transformer architecture Default: ‘b’

  • img_size (int | tuple) – Input image size

  • patch_size (int | tuple) – The patch size

  • out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.

  • drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.

  • output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.

  • interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.

  • patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.

  • layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.

  • mask_ratio (bool) – The ratio of total number of patches to be masked. Defaults to 0.75.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

init_weights()[source]

Initialize the weights.

random_masking(x, mask_ratio=0.75)[source]

Generate the mask for MAE Pre-training.

Parameters
  • x (torch.tensor) – Image with data augmentation applied.

  • mask_ratio (float) – The mask ratio of total patches. Defaults to 0.75.

Returns

masked image, mask and the ids

to restore original image.

  • x_masked (Tensor): masked image.

  • mask (Tensor): mask used to mask image.

  • ids_restore (Tensor): ids to restore original image.

Return type

tuple[Tensor, Tensor, Tensor]

class mmselfsup.models.backbones.MIMVisionTransformer(arch='b', img_size=224, patch_size=16, out_indices=- 1, use_window=False, drop_rate=0, drop_path_rate=0, qkv_bias=True, norm_cfg={'eps': 1e-06, 'type': 'LN'}, final_norm=True, output_cls_token=True, interpolate_mode='bicubic', init_values=0.0, patch_cfg={}, layer_cfgs={}, finetune=True, init_cfg=None)[source]

Vision Transformer for MIM-style model (Mask Image Modeling) classification (fine-tuning or linear probe).

A PyTorch implement of : An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Parameters
  • arch (str | dict) – Vision Transformer architecture Default: ‘b’

  • img_size (int | tuple) – Input image size

  • patch_size (int | tuple) – The patch size

  • out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.

  • drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.

  • output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.

  • interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.

  • patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.

  • layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.

  • finetune (bool) – Whether or not do fine-tuning. Defaults to True.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

train(mode=True)[source]

Set module status before forward computation.

Parameters

mode (bool) – Whether it is train_mode or test_mode

class mmselfsup.models.backbones.MaskFeatViT(arch: Union[str, dict] = 'b', img_size: Union[Tuple[int, int], int] = 224, patch_size: int = 16, out_indices: int = - 1, drop_rate: float = 0.0, drop_path_rate: float = 0.0, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, final_norm: bool = True, output_cls_token: bool = True, interpolate_mode: str = 'bicubic', patch_cfg: dict = {}, layer_cfgs: dict = {}, init_cfg: Optional[dict] = None)[source]

Vision Transformer for MaskFeat pre-training.

A PyTorch implement of: Masked Feature Prediction for Self-Supervised Visual Pre-Training. :param arch: Vision Transformer architecture

Default: ‘b’

Parameters
  • img_size (int | tuple) – Input image size

  • patch_size (int | tuple) – The patch size

  • out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.

  • drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.

  • output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.

  • interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.

  • patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.

  • layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor, mask: torch.Tensor)torch.Tensor[source]

Generate features for masked images.

Parameters
  • x (torch.Tensor) – Input images.

  • mask (torch.Tensor) – Input masks.

Returns

Features with cls_tokens.

Return type

torch.Tensor

init_weights()None[source]

Initialize the weights.

class mmselfsup.models.backbones.ResNeXt(depth, groups=32, width_per_group=4, **kwargs)[source]

ResNeXt backbone.

Please refer to the paper for details.

As the behavior of forward function in MMSelfSup is different from MMCls, we register our own ResNeXt, inheriting from mmselfsup.model.backbone.ResNet.

Parameters
  • depth (int) – Network depth, from {50, 101, 152}.

  • groups (int) – Groups of conv2 in Bottleneck. Defaults to 32.

  • width_per_group (int) – Width per group of conv2 in Bottleneck. Defaults to 4.

  • in_channels (int) – Number of input image channels. Defaults to 3.

  • stem_channels (int) – Output channels of the stem layer. Defaults to 64.

  • num_stages (int) – Stages of the network. Defaults to 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Defaults to (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Defaults to (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Defaults to (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Defaults to False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Defaults to False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Defaults to None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Defaults to False.

Example

>>> from mmselfsup.models import ResNeXt
>>> import torch
>>> self = ResNeXt(depth=50)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 8, 8)
(1, 512, 4, 4)
(1, 1024, 2, 2)
(1, 2048, 1, 1)
make_res_layer(**kwargs)[source]

Redefine the function for ResNeXt related args.

class mmselfsup.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, expansion=None, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(4), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}], drop_path_rate=0.0, **kwargs)[source]

ResNet backbone.

Please refer to the paper for details.

Parameters
  • depth (int) – Network depth, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Defaults to 3.

  • stem_channels (int) – Output channels of the stem layer. Defaults to 64.

  • base_channels (int) – Middle channels of the first stage. Defaults to 64.

  • num_stages (int) – Stages of the network. Defaults to 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Defaults to (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Defaults to (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (4, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Defaults to False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Defaults to False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Defaults to None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Defaults to False.

  • of the path to be zeroed. Defaults to 0.1 (Probability) –

Example

>>> from mmselfsup.models import ResNet
>>> import torch
>>> self = ResNet(depth=18)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)
forward(x)[source]

Forward function.

As the behavior of forward function in MMSelfSup is different from MMCls, we rewrite the forward function. MMCls does not output the feature map from the ‘stem’ layer, which we will use for downstream evaluation.

class mmselfsup.models.backbones.ResNetV1d(**kwargs)[source]

ResNetV1d variant described in Bag of Tricks.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmselfsup.models.backbones.SimMIMSwinTransformer(arch: Union[str, dict] = 'T', img_size: Union[Tuple[int, int], int] = 224, in_channels: int = 3, drop_rate: float = 0.0, drop_path_rate: float = 0.1, out_indices: tuple = (3), use_abs_pos_embed: bool = False, with_cp: bool = False, frozen_stages: bool = - 1, norm_eval: bool = False, norm_cfg: dict = {'type': 'LN'}, stage_cfgs: Union[Sequence, dict] = {}, patch_cfg: dict = {}, init_cfg: Optional[dict] = None)[source]

Swin Transformer for SimMIM.

Parameters
  • Args

  • arch (str | dict) – Swin Transformer architecture Defaults to ‘T’.

  • img_size (int | tuple) – The size of input image. Defaults to 224.

  • in_channels (int) – The num of input channels. Defaults to 3.

  • drop_rate (float) – Dropout rate after embedding. Defaults to 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults to 0.1.

  • out_indices (tuple) – Layers to be outputted. Defaults to (3, ).

  • use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults to False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • norm_cfg (dict) – Config dict for normalization layer at end of backone. Defaults to dict(type=’LN’)

  • stage_cfgs (Sequence | dict) – Extra config dict for each stage. Defaults to empty dict.

  • patch_cfg (dict) – Extra config dict for patch embedding. Defaults to empty dict.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x: torch.Tensor, mask: torch.Tensor)Sequence[torch.Tensor][source]

Generate features for masked images.

This function generates mask images and get the hidden features for them.

Parameters
  • x (torch.Tensor) – Input images.

  • mask (torch.Tensor) – Masks used to construct masked images.

Returns

A tuple containing features from multi-stages.

Return type

tuple

init_weights()None[source]

Initialize weights.

class mmselfsup.models.backbones.VisionTransformer(stop_grad_conv1=False, frozen_stages=- 1, norm_eval=False, init_cfg=None, **kwargs)[source]

Vision Transformer.

A pytorch implement of: An Images is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Part of the code is modified from: https://github.com/facebookresearch/moco-v3/blob/main/vits.py.

Parameters
  • stop_grad_conv1 (bool, optional) – whether to stop the gradient of convolution layer in PatchEmbed. Defaults to False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

init_weights()[source]

Initialize the weights.

train(mode=True)[source]

Set module status before forward computation.

Parameters

mode (bool) – Whether it is train_mode or test_mode

heads

class mmselfsup.models.heads.CAEHead(tokenizer_path: str, lambd: float, init_cfg: Optional[dict] = None)[source]

Pretrain Head for CAE.

Compute the align loss and the main loss. In addition, this head also generates the prediction target generated by dalle.

Parameters
  • tokenizer_path (str) – The path of the tokenizer.

  • lambd (float) – The weight for the align loss.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(img_target: torch.Tensor, outputs: torch.Tensor, latent_pred: torch.Tensor, latent_target: torch.Tensor, mask: torch.Tensor)dict[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.heads.ClsHead(with_avg_pool=False, in_channels=2048, num_classes=1000, vit_backbone=False, init_cfg=[{'type': 'Normal', 'std': 0.01, 'layer': 'Linear'}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]

Simplest classifier head, with only one fc layer.

Parameters
  • with_avg_pool (bool) – Whether to apply the average pooling after neck. Defaults to False.

  • in_channels (int) – Number of input channels. Defaults to 2048.

  • num_classes (int) – Number of classes. Defaults to 1000.

  • init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x)[source]

Forward head.

Parameters

x (list[Tensor] | tuple[Tensor]) – Feature maps of backbone, each tensor has shape (N, C, H, W).

Returns

A list of class scores.

Return type

list[Tensor]

loss(cls_score, labels)[source]

Compute the loss.

class mmselfsup.models.heads.ContrastiveHead(temperature=0.1)[source]

Head for contrastive learning.

The contrastive loss is implemented in this head and is used in SimCLR, MoCo, DenseCL, etc.

Parameters

temperature (float) – The temperature hyper-parameter that controls the concentration level of the distribution. Defaults to 0.1.

forward(pos, neg)[source]

Forward function to compute contrastive loss.

Parameters
  • pos (Tensor) – Nx1 positive similarity.

  • neg (Tensor) – Nxk negative similarity.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

class mmselfsup.models.heads.LatentClsHead(in_channels: int, num_classes: int, init_cfg: dict = {'layer': 'Linear', 'std': 0.01, 'type': 'Normal'})[source]

Head for latent feature classification.

Parameters
  • in_channels (int) – Number of input channels.

  • num_classes (int) – Number of classes.

  • init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(input: torch.Tensor, target: torch.Tensor)dict[source]

Forward head.

Parameters
  • input (Tensor) – NxC input features.

  • target (Tensor) – NxC target features.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

class mmselfsup.models.heads.LatentCrossCorrelationHead(in_channels: int, lambd: float = 0.0051)[source]

Head for latent feature cross correlation. Part of the code is borrowed from: `https://github.com/facebookresearch/barlowtwins/blob/main/main.py>`_.

Parameters
  • in_channels (int) – Number of input channels.

  • lambd (float) – Weight on off-diagonal terms. Defaults to 0.0051.

forward(input: torch.Tensor, target: torch.Tensor)dict[source]

Forward head.

Parameters
  • input (Tensor) – NxC input features.

  • target (Tensor) – NxC target features.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

off_diagonal(x: torch.Tensor)torch.Tensor[source]

Rreturn a flattened view of the off-diagonal elements of a square matrix.

class mmselfsup.models.heads.LatentPredictHead(predictor: dict)[source]

Head for latent feature prediction.

This head builds a predictor, which can be any registered neck component. For example, BYOL and SimSiam call this head and build NonLinearNeck. It also implements similarity loss between two forward features.

Parameters

predictor (dict) – Config dict for the predictor.

forward(input: torch.Tensor, target: torch.Tensor)dict[source]

Forward head.

Parameters
  • input (Tensor) – NxC input features.

  • target (Tensor) – NxC target features.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

class mmselfsup.models.heads.MAEFinetuneHead(embed_dim, num_classes=1000, label_smooth_val=0.1)[source]

Fine-tuning head for MAE.

Parameters
  • embed_dim (int) – The dim of the feature before the classifier head.

  • num_classes (int) – The total classes. Defaults to 1000.

forward(x)[source]

“Get the logits.

init_weights()[source]

Initialize the weights.

loss(outputs, labels)[source]

Compute the loss.

class mmselfsup.models.heads.MAELinprobeHead(embed_dim, num_classes=1000)[source]

Linear probing head for MAE.

Parameters
  • embed_dim (int) – The dim of the feature before the classifier head.

  • num_classes (int) – The total classes. Defaults to 1000.

forward(x)[source]

“Get the logits.

init_weights()[source]

Initialize the weights.

loss(outputs, labels)[source]

Compute the loss.

class mmselfsup.models.heads.MAEPretrainHead(norm_pix: bool = False, patch_size: int = 16)[source]

Pre-training head for MAE.

Parameters
  • norm_pix_loss (bool) – Whether or not normalize target. Defaults to False.

  • patch_size (int) – Patch size. Defaults to 16.

forward(x: torch.Tensor, pred: torch.Tensor, mask: torch.Tensor)dict[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

patchify(imgs: torch.Tensor)torch.Tensor[source]
Parameters

imgs (torch.Tensor) – The shape is (N, 3, H, W)

Returns

The shape is (N, L, patch_size**2 *3)

Return type

x (torch.Tensor)

unpatchify(x: torch.Tensor)torch.Tensor[source]
Parameters

x (torch.Tensor) – The shape is (N, L, patch_size**2 *3)

Returns

The shape is (N, 3, H, W)

Return type

imgs (torch.Tensor)

class mmselfsup.models.heads.MaskFeatFinetuneHead(embed_dim: int, num_classes: int = 1000, label_smooth_val: float = 0.1)[source]

Fine-tuning head for MaskFeat.

Parameters
  • embed_dim (int) – The dim of the feature before the classifier head.

  • num_classes (int) – The total classes. Defaults to 1000.

  • label_smooth_val (float) – The degree of label smoothing. Defaults to 0.1.

forward(x: torch.Tensor)list[source]

“Get the logits.

init_weights()None[source]

Initialize the weights.

loss(outputs: torch.Tensor, labels: torch.Tensor)dict[source]

Compute the loss.

class mmselfsup.models.heads.MaskFeatPretrainHead(embed_dim: int = 768, hog_dim: int = 108)[source]

Pre-training head for MaskFeat.

Parameters
  • embed_dim (int) – The dim of the feature before the classifier head. Defaults to 768.

  • hog_dim (int) – The dim of the hog feature. Defaults to 108.

forward(latent: torch.Tensor, hog: torch.Tensor, mask: torch.Tensor)dict[source]

Pre-training head for MaskFeat.

Parameters
  • latent (torch.Tensor) – Input latent of shape (N, 1+L, C).

  • hog (torch.Tensor) – Input hog feature of shape (N, L, C).

  • mask (torch.Tensor) – Input mask of shape (N, H, W).

Returns

A dictionary of loss components.

Return type

Dict[str, torch.Tensor]

init_weights()None[source]

Initialize the weights.

loss(pred: torch.Tensor, target: torch.Tensor, mask: torch.Tensor)dict[source]

Compute the loss.

Parameters
  • pred (torch.Tensor) – Input prediction of shape (N, L, C).

  • target (torch.Tensor) – Input target of shape (N, L, C).

  • mask (torch.Tensor) – Input mask of shape (N, L, 1).

Returns

A dictionary of loss components.

Return type

dict[str, torch.Tensor]

class mmselfsup.models.heads.MoCoV3Head(predictor, temperature=1.0)[source]

Head for MoCo v3 algorithms.

This head builds a predictor, which can be any registered neck component. It also implements latent contrastive loss between two forward features. Part of the code is modified from: https://github.com/facebookresearch/moco-v3/blob/main/moco/builder.py.

Parameters
  • predictor (dict) – Config dict for module of predictor.

  • temperature (float) – The temperature hyper-parameter that controls the concentration level of the distribution. Defaults to 1.0.

forward(base_out, momentum_out)[source]

Forward head.

Parameters
  • base_out (Tensor) – NxC features from base_encoder.

  • momentum_out (Tensor) – NxC features from momentum_encoder.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

class mmselfsup.models.heads.MultiClsHead(pool_type='adaptive', in_indices=(0), with_last_layer_unpool=False, backbone='resnet50', norm_cfg={'type': 'BN'}, num_classes=1000, init_cfg=[{'type': 'Normal', 'std': 0.01, 'layer': 'Linear'}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]

Multiple classifier heads.

This head inputs feature maps from different stages of backbone, average pools each feature map to around 9000 dimensions, and then appends a linear classifier at each stage to predict corresponding class scores.

Parameters
  • pool_type (str) – ‘adaptive’ or ‘specified’. If set to ‘adaptive’, use adaptive average pooling, otherwise use specified pooling params.

  • in_indices (Sequence[int]) – Input from which stages.

  • with_last_layer_unpool (bool) – Whether to unpool the features from last layer. Defaults to False.

  • backbone (str) – Specify which backbone to use. Defaults to ‘resnet50’.

  • norm_cfg (dict) – dictionary to construct and config norm layer.

  • num_classes (int) – Number of classes. Defaults to 1000.

  • init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x)[source]

Forward head.

Parameters

x (list[Tensor] | tuple[Tensor]) – Feature maps of backbone, each tensor has shape (N, C, H, W).

Returns

A list of class scores.

Return type

list[Tensor]

loss(cls_score, labels)[source]

Compute the loss.

class mmselfsup.models.heads.SimMIMHead(patch_size: int, encoder_in_channels: int)[source]

Pretrain Head for SimMIM.

Parameters
  • patch_size (int) – Patch size of each token.

  • encoder_in_channels (int) – Number of input channels for encoder.

forward(x: torch.Tensor, x_rec: torch.Tensor, mask: torch.Tensor)dict[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.heads.SwAVHead(feat_dim, sinkhorn_iterations=3, epsilon=0.05, temperature=0.1, crops_for_assign=[0, 1], num_crops=[2], num_prototypes=3000, init_cfg=None)[source]

The head for SwAV.

This head contains clustering and sinkhorn algorithms to compute Q codes. Part of the code is borrowed from: `<https://github.com/facebookresearch/swav`_. The queue is built in core/hooks/swav_hook.py.

Parameters
  • feat_dim (int) – feature dimension of the prototypes.

  • sinkhorn_iterations (int) – number of iterations in Sinkhorn-Knopp algorithm. Defaults to 3.

  • epsilon (float) – regularization parameter for Sinkhorn-Knopp algorithm. Defaults to 0.05.

  • temperature (float) – temperature parameter in training loss. Defaults to 0.1.

  • crops_for_assign (list[int]) – list of crops id used for computing assignments. Defaults to [0, 1].

  • num_crops (list[int]) – list of number of crops. Defaults to [2].

  • num_prototypes (int) – number of prototypes. Defaults to 3000.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x)[source]

Forward head of swav to compute the loss.

Parameters

x (Tensor) – NxC input features.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

memories

necks

utils

class mmselfsup.models.utils.Accuracy(topk=(1))[source]

Implementation of accuracy computation.

forward(pred, target)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.utils.CAETransformerRegressorLayer(embed_dims: int, num_heads: int, feedforward_channels: int, num_fcs: int = 2, qkv_bias: bool = False, qk_scale: Optional[float] = None, drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, init_values: float = 0.0, act_cfg: dict = {'type': 'GELU'}, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'})[source]

Transformer layer for the regressor of CAE.

This module is different from conventional transformer encoder layer, for its queries are the masked tokens, but its keys and values are the concatenation of the masked and unmasked tokens.

Parameters
  • embed_dims (int) – The feature dimension.

  • num_heads (int) – The number of heads in multi-head attention.

  • feedforward_channels (int) – The hidden dimension of FFNs. Defaults: 1024.

  • num_fcs (int, optional) – The number of fully-connected layers in FFNs. Default: 2.

  • qkv_bias (bool) – If True, add a learnable bias to q, k, v. Defaults to True.

  • qk_scale (float, optional) – Override default qk scale of head_dim ** -0.5 if set. Defaults to None.

  • drop_rate (float) – The dropout rate. Defaults to 0.0.

  • attn_drop_rate (float) – The drop out rate for attention output weights. Defaults to 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults to 0.

  • init_values (float) – The init values of gamma. Defaults to 0.0.

  • act_cfg (dict) – The activation config for FFNs. Defaluts to dict(type='GELU').

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

forward(x_q: torch.Tensor, x_kv: torch.Tensor, pos_q: torch.Tensor, pos_k: torch.Tensor)torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.utils.Encoder(n_hid: int = 256, n_blk_per_group: int = 2, input_channels: int = 3, vocab_size: int = 8192, device: torch.device = device(type='cpu'), requires_grad: bool = False, use_mixed_precision: bool = True)[source]
forward(x: torch.Tensor)torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.utils.ExtractProcess[source]

Global average-pooled feature extraction process.

This process extracts the global average-pooled features from the last layer of resnet backbone.

extract(model, data_loader, distributed=False)[source]

The extract function to apply forward function and choose distributed or not.

class mmselfsup.models.utils.GatherLayer(*args, **kwargs)[source]

Gather tensors from all process, supporting backward propagation.

static backward(ctx, *grads)[source]

Defines a formula for differentiating the operation with backward mode automatic differentiation (alias to the vjp function).

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computated w.r.t. the output.

static forward(ctx, input)[source]

Performs the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with ctx.save_for_backward() if they are intended to be used in backward (equivalently, vjp) or ctx.save_for_forward() if they are intended to be used for in jvp.

class mmselfsup.models.utils.MultiExtractProcess(pool_type='specified', backbone='resnet50', layer_indices=(0, 1, 2, 3, 4))[source]

Multi-stage intermediate feature extraction process for extract.py and tsne_visualization.py in tools.

This process extracts feature maps from different stages of backbone, and average pools each feature map to around 9000 dimensions.

Parameters
  • pool_type (str) – Pooling type in MultiPooling. Options are “adaptive” and “specified”. Defaults to “specified”.

  • backbone (str) – Backbone type, now only support “resnet50”. Defaults to “resnet50”.

  • layer_indices (Sequence[int]) – Output from which stages. 0 for stem, 1, 2, 3, 4 for res layers. Defaults to (0, 1, 2, 3, 4).

extract(model, data_loader, distributed=False)[source]

The extract function to apply forward function and choose distributed or not.

class mmselfsup.models.utils.MultiPooling(pool_type='adaptive', in_indices=(0), backbone='resnet50')[source]

Pooling layers for features from multiple depth.

Parameters
  • pool_type (str) – Pooling type for the feature map. Options are ‘adaptive’ and ‘specified’. Defaults to ‘adaptive’.

  • in_indices (Sequence[int]) – Output from which backbone stages. Defaults to (0, ).

  • backbone (str) – The selected backbone. Defaults to ‘resnet50’.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.utils.MultiPrototypes(output_dim, num_prototypes)[source]

Multi-prototypes for SwAV head.

Parameters
  • output_dim (int) – The output dim from SwAV neck.

  • num_prototypes (list[int]) – The number of prototypes needed.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.utils.MultiheadAttention(embed_dims: int, num_heads: int, input_dims: Optional[int] = None, attn_drop: float = 0.0, proj_drop: float = 0.0, qkv_bias: bool = True, qk_scale: Optional[float] = None, proj_bias: bool = True, init_cfg: Optional[dict] = None)[source]

Multi-head Attention Module.

This module rewrite the MultiheadAttention by replacing qkv bias with customized qkv bias, in addition to removing the drop path layer.

Parameters
  • embed_dims (int) – The embedding dimension.

  • num_heads (int) – Parallel attention heads.

  • input_dims (int, optional) – The input dimension, and if None, use embed_dims. Defaults to None.

  • attn_drop (float) – Dropout rate of the dropout layer after the attention calculation of query and key. Defaults to 0.

  • proj_drop (float) – Dropout rate of the dropout layer after the output projection. Defaults to 0.

  • dropout_layer (dict) – The dropout config before adding the shortcut. Defaults to dict(type='Dropout', drop_prob=0.).

  • qkv_bias (bool) – If True, add a learnable bias to q, k, v. Defaults to True.

  • qk_scale (float, optional) – Override default qk scale of head_dim ** -0.5 if set. Defaults to None.

  • proj_bias (bool) – Defaults to True.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x: torch.Tensor)torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.utils.Sobel[source]

Sobel layer.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.utils.TransformerEncoderLayer(embed_dims: int, num_heads: int, feedforward_channels: int, window_size: Optional[int] = None, drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, num_fcs: int = 2, qkv_bias: bool = True, act_cfg: dict = {'type': 'GELU'}, norm_cfg: dict = {'type': 'LN'}, init_values: float = 0.0, init_cfg: Optional[dict] = None)[source]

Implements one encoder layer in Vision Transformer.

This module is the rewritten version of the TransformerEncoderLayer in MMClassification by adding the gamma and relative position bias in Attention module.

Parameters
  • embed_dims (int) – The feature dimension.

  • num_heads (int) – Parallel attention heads

  • feedforward_channels (int) – The hidden dimension for FFNs

  • drop_rate (float) – Probability of an element to be zeroed after the feed forward layer. Defaults to 0.

  • attn_drop_rate (float) – The drop out rate for attention output weights. Defaults to 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults to 0.

  • num_fcs (int) – The number of fully-connected layers for FFNs. Defaults to 2.

  • qkv_bias (bool) – enable bias for qkv if True. Defaults to True.

  • act_cfg (dict) – The activation config for FFNs. Defaluts to dict(type='GELU').

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • init_values (float) – The init values of gamma. Defaults to 0.0.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor)torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmselfsup.models.utils.accuracy(pred, target, topk=1)[source]

Compute accuracy of predictions.

Parameters
  • pred (Tensor) – The output of the model.

  • target (Tensor) – The labels of data.

  • topk (int | list[int]) – Top-k metric selection. Defaults to 1.

mmselfsup.models.utils.build_2d_sincos_position_embedding(patches_resolution, embed_dims, temperature=10000.0, cls_token=False)[source]

The function is to build position embedding for model to obtain the position information of the image patches.

mmselfsup.models.utils.knn_classifier(train_features, train_labels, test_features, test_labels, k, T, num_classes=1000)[source]

Compute accuracy of knn classifier predictions.

Parameters
  • train_features (Tensor) – Extracted features in the training set.

  • train_labels (Tensor) – Labels in the training set.

  • test_features (Tensor) – Extracted features in the testing set.

  • test_labels (Tensor) – Labels in the testing set.

  • k (int) – Number of NN to use.

  • T (float) – Temperature used in the voting coefficient.

  • num_classes (int) – Number of classes. Defaults to 1000.

mmselfsup.utils

class mmselfsup.utils.AliasMethod(probs)[source]

The alias method for sampling.

From: https://hips.seas.harvard.edu/blog/2013/03/03/the-alias-method-efficient-sampling-with-many-discrete-outcomes/

Parameters

probs (Tensor) – Sampling probabilities.

draw(N)[source]

Draw N samples from multinomial.

Parameters

N (int) – Number of samples.

Returns

Samples.

Return type

Tensor

class mmselfsup.utils.Extractor(dataset, samples_per_gpu, workers_per_gpu, dist_mode=False, persistent_workers=True, **kwargs)[source]

Feature extractor.

Parameters
  • dataset (Dataset | dict) – A PyTorch dataset or dict that indicates the dataset.

  • samples_per_gpu (int) – Number of images on each GPU, i.e., batch size of each GPU.

  • workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.

  • dist_mode (bool) – Use distributed extraction or not. Defaults to False.

  • persistent_workers (bool) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. The argument also has effect in PyTorch>=1.7.0. Defaults to True.

mmselfsup.utils.batch_shuffle_ddp(x)[source]

Batch shuffle, for making use of BatchNorm.

* Only support DistributedDataParallel (DDP) model. *

mmselfsup.utils.batch_unshuffle_ddp(x, idx_unshuffle)[source]

Undo batch shuffle.

* Only support DistributedDataParallel (DDP) model. *

mmselfsup.utils.collect_env()[source]

Collect the information of the running environments.

mmselfsup.utils.concat_all_gather(tensor)[source]

Performs all_gather operation on the provided tensors.

* Warning *: torch.distributed.all_gather has no gradient.

mmselfsup.utils.dist_forward_collect(func, data_loader, rank, length, ret_rank=- 1)[source]

Forward and collect network outputs in a distributed manner.

This function performs forward propagation and collects outputs. It can be used to collect results, features, losses, etc.

Parameters
  • func (function) – The function to process data. The output must be a dictionary of CPU tensors.

  • data_loader (Dataloader) – the torch Dataloader to yield data.

  • rank (int) – This process id.

  • length (int) – Expected length of output arrays.

  • ret_rank (int) – The process that returns. Other processes will return None.

Returns

The concatenated outputs.

Return type

results_all (dict(np.ndarray))

mmselfsup.utils.distributed_sinkhorn(out, sinkhorn_iterations, world_size, epsilon)[source]

Apply the distributed sinknorn optimization on the scores matrix to find the assignments.

mmselfsup.utils.find_latest_checkpoint(path, suffix='pth')[source]

Find the latest checkpoint from the working directory. :param path: The path to find checkpoints. :type path: str :param suffix: File extension.

Defaults to pth.

Returns

File path of the latest checkpoint.

Return type

latest_path(str | None)

References

1

https://github.com/microsoft/SoftTeacher /blob/main/ssod/utils/patch.py

2

https://github.com/open-mmlab/mmdetection /blob/master/mmdet/utils/misc.py#L7

mmselfsup.utils.gather_tensors(input_array)[source]

Gather tensor from all GPUs.

mmselfsup.utils.gather_tensors_batch(input_array, part_size=100, ret_rank=- 1)[source]

batch-wise gathering to avoid CUDA out of memory.

mmselfsup.utils.get_root_logger(log_file=None, log_level=20)[source]

Get root logger.

Parameters
  • log_file (str, optional) – File path of log. Defaults to None.

  • log_level (int, optional) – The level of logger. Defaults to logging.INFO.

Returns

The obtained logger.

Return type

logging.Logger

mmselfsup.utils.nondist_forward_collect(func, data_loader, length)[source]

Forward and collect network outputs.

This function performs forward propagation and collects outputs. It can be used to collect results, features, losses, etc.

Parameters
  • func (function) – The function to process data. The output must be a dictionary of CPU tensors.

  • data_loader (Dataloader) – the torch Dataloader to yield data.

  • length (int) – Expected length of output arrays.

Returns

The concatenated outputs.

Return type

results_all (dict(np.ndarray))

mmselfsup.utils.setup_multi_processes(cfg)[source]

Setup multi-processing environment variables.

mmselfsup.utils.sync_random_seed(seed=None, device='cuda')[source]

Make sure different ranks share the same seed. All workers must call this function, otherwise it will deadlock. This method is generally used in DistributedSampler, because the seed should be identical across all processes in the distributed group.

In distributed sampling, different ranks should sample non-overlapped data in the dataset. Therefore, this function is used to make sure that each rank shuffles the data indices in the same order based on the same seed. Then different ranks could use different indices to select non-overlapped data from the same data list.

Parameters
  • seed (int, Optional) – The seed. Default to None.

  • device (str) – The device where the seed will be put on. Default to ‘cuda’.

Returns

Seed to be used.

Return type

int

References

1

https://github.com/open-mmlab/mmdetection /blob/master/mmdet/core/utils/dist_utils.py

Read the Docs v: latest
Versions
latest
stable
1.x
dev-1.x
dev
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.