注意

您正在阅读 MMSelfSup 0.x 版本的文档，而 MMSelfSup 0.x 版本将会在 2022 年末开始逐步停止维护。我们建议您及时升级到 MMSelfSup 1.0.0rc 版本，享受由 OpenMMLab 2.0 带来的更多新特性和更佳的性能表现。阅读 MMSelfSup 1.0.0rc 的发版日志, 代码和文档获取更多信息。

mmselfsup.apis¶

mmselfsup.apis.inference_model(model: torch.nn.modules.module.Module, data: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/mmselfsup-zh-cn/envs/0.x/lib/python3.7/site-packages/PIL/Image.py'>) → Tuple[torch.Tensor, Union[torch.Tensor, dict]][源代码]¶

Inference an image with the model. :param model: The loaded model. :type model: nn.Module :param data: The loaded image. :type data: PIL.Image

返回

Output of model: inference. - data (torch.Tensor): The loaded image to input model. - output (torch.Tensor, dict[str, torch.Tensor]): the output

of test model.

返回类型

Tuple[torch.Tensor, Union(torch.Tensor, dict)]

mmselfsup.apis.init_model(config: Union[str, mmcv.utils.config.Config], checkpoint: Optional[str] = None, device: str = 'cuda:0', options: Optional[dict] = None) → torch.nn.modules.module.Module[源代码]¶

Initialize an model from config file.

参数

config (str or :obj:mmcv.Config) – Config file path or the config object.
checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights. Defaults to None.
device (str) – The device where the model will be put on. Default to ‘cuda:0’.
options (dict, optional) – Options to override some settings in the used config. Defaults to None.

返回

The initialized model.

返回类型

nn.Module

mmselfsup.apis.init_random_seed(seed=None, device='cuda')[源代码]¶

Initialize random seed.

If the seed is not set, the seed will be automatically randomized, and then broadcast to all processes to prevent some potential bugs. :param seed: The seed. Default to None. :type seed: int, Optional :param device: The device where the seed will be put on.

Default to ‘cuda’.

返回: Seed to be used.
返回类型: int

mmselfsup.apis.set_random_seed(seed, deterministic=False)[源代码]¶

Set random seed.

参数

seed (int) – Seed to be used.
deterministic (bool) – Whether to set the deterministic option for CUDNN backend, i.e., set torch.backends.cudnn.deterministic to True and torch.backends.cudnn.benchmark to False. Defaults to False.

mmselfsup.core¶

hooks¶

class mmselfsup.core.hooks.DeepClusterHook(extractor, clustering, unif_sampling, reweight, reweight_pow, init_memory=False, initial=True, interval=1, dist_mode=True, data_loaders=None)[源代码]¶

Hook for DeepCluster.

This hook includes the global clustering process in DC.

参数

extractor (dict) – Config dict for feature extraction.
clustering (dict) – Config dict that specifies the clustering algorithm.
unif_sampling (bool) – Whether to apply uniform sampling.
reweight (bool) – Whether to apply loss re-weighting.
reweight_pow (float) – The power of re-weighting.
init_memory (bool) – Whether to initialize memory banks used in ODC. Defaults to False.
initial (bool) – Whether to call the hook initially. Defaults to True.
interval (int) – Frequency of epochs to call the hook. Defaults to 1.
dist_mode (bool) – Use distributed training or not. Defaults to True.
data_loaders (DataLoader) – A PyTorch dataloader. Defaults to None.

class mmselfsup.core.hooks.DenseCLHook(start_iters=1000, **kwargs)[源代码]¶

Hook for DenseCL.

This hook includes loss_lambda warmup in DenseCL. Borrowed from the authors’ code: https://github.com/WXinlong/DenseCL.

参数: start_iters (int, optional) – The number of warmup iterations to set loss_lambda=0. Defaults to 1000.

class mmselfsup.core.hooks.DistOptimizerHook(update_interval=1, grad_clip=None, coalesce=True, bucket_size_mb=- 1, frozen_layers_cfg={})[源代码]¶

Optimizer hook for distributed training.

This hook can accumulate gradients every n intervals and freeze some layers for some iters at the beginning.

参数

update_interval (int, optional) – The update interval of the weights, set > 1 to accumulate the grad. Defaults to 1.
grad_clip (dict, optional) – Dict to config the value of grad clip. E.g., grad_clip = dict(max_norm=10). Defaults to None.
coalesce (bool, optional) – Whether allreduce parameters as a whole. Defaults to True.
bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Defaults to -1.
frozen_layers_cfg (dict, optional) – Dict to config frozen layers. The key-value pair is layer name and its frozen iters. If frozen, the layer gradient would be set to None. Defaults to dict().

class mmselfsup.core.hooks.GradAccumFp16OptimizerHook(update_interval=1, frozen_layers_cfg={}, **kwargs)[源代码]¶

Fp16 optimizer hook (using PyTorch’s implementation).

This hook can accumulate gradients every n intervals and freeze some layers for some iters at the beginning. If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, to take care of the optimization procedure.

参数

update_interval (int, optional) – The update interval of the weights, set > 1 to accumulate the grad. Defaults to 1.
frozen_layers_cfg (dict, optional) – Dict to config frozen layers. The key-value pair is layer name and its frozen iters. If frozen, the layer gradient would be set to None. Defaults to dict().

after_train_iter(runner)[源代码]¶

Backward optimization steps for Mixed Precision Training. For dynamic loss scaling, please refer to https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler.

Scale the loss by a scale factor.
Backward the loss to obtain the gradients.
Unscale the optimizer’s gradient tensors.
Call optimizer.step() and update scale factor.
Save loss_scaler state_dict for resume purpose.

class mmselfsup.core.hooks.InterCLRHook(extractor, clustering, centroids_update_interval, deal_with_small_clusters_interval, evaluate_interval, warmup_epochs=0, init_memory=True, initial=True, online_labels=True, interval=1, dist_mode=True, data_loaders=None)[源代码]¶

Hook for InterCLR.

This hook includes the clustering process in InterCLR.

参数

extractor (dict) – Config dict for feature extraction.
clustering (dict) – Config dict that specifies the clustering algorithm.
centroids_update_interval (int) – Frequency of iterations to update centroids.
deal_with_small_clusters_interval (int) – Frequency of iterations to deal with small clusters.
evaluate_interval (int) – Frequency of iterations to evaluate clusters.
warmup_epochs (int, optional) – The number of warmup epochs to set intra_loss_weight=1 and inter_loss_weight=0. Defaults to 0.
init_memory (bool) – Whether to initialize memory banks used in online labels. Defaults to True.
initial (bool) – Whether to call the hook initially. Defaults to True.
online_labels (bool) – Whether to use online labels. Defaults to True.
interval (int) – Frequency of epochs to call the hook. Defaults to 1.
dist_mode (bool) – Use distributed training or not. Defaults to True.
data_loaders (DataLoader) – A PyTorch dataloader. Defaults to None.

class mmselfsup.core.hooks.MomentumUpdateHook(end_momentum=1.0, update_interval=1, **kwargs)[源代码]¶

Hook for updating momentum parameter, used by BYOL, MoCoV3, etc.

This hook includes momentum adjustment following:

\[m = 1 - (1 - m_0) * (cos(pi * k / K) + 1) / 2\]

where \(k\) is the current step, \(K\) is the total steps.

参数

end_momentum (float) – The final momentum coefficient for the target network. Defaults to 1.
update_interval (int, optional) – The momentum update interval of the weights. Defaults to 1.

class mmselfsup.core.hooks.ODCHook(centroids_update_interval, deal_with_small_clusters_interval, evaluate_interval, reweight, reweight_pow, dist_mode=True)[源代码]¶

Hook for ODC.

This hook includes the online clustering process in ODC.

参数

centroids_update_interval (int) – Frequency of iterations to update centroids.
deal_with_small_clusters_interval (int) – Frequency of iterations to deal with small clusters.
evaluate_interval (int) – Frequency of iterations to evaluate clusters.
reweight (bool) – Whether to perform loss re-weighting.
reweight_pow (float) – The power of re-weighting.
dist_mode (bool) – Use distributed training or not. Defaults to True.

class mmselfsup.core.hooks.SimSiamHook(fix_pred_lr, lr, adjust_by_epoch=True, **kwargs)[源代码]¶

Hook for SimSiam.

This hook is for SimSiam to fix learning rate of predictor.

参数

fix_pred_lr (bool) – whether to fix the lr of predictor or not.
lr (float) – the value of fixed lr.
adjust_by_epoch (bool, optional) – whether to set lr by epoch or iter. Defaults to True.

before_train_epoch(runner)[源代码]¶: fix lr of predictor.

class mmselfsup.core.hooks.StepFixCosineAnnealingLrUpdaterHook(min_lr: Optional[float] = None, min_lr_ratio: Optional[float] = None, **kwargs)[源代码]¶

class mmselfsup.core.hooks.SwAVHook(batch_size, epoch_queue_starts=15, crops_for_assign=[0, 1], feat_dim=128, queue_length=0, interval=1, **kwargs)[源代码]¶

Hook for SwAV.

This hook builds the queue in SwAV according to epoch_queue_starts. The queue will be saved in runner.work_dir or loaded at start epoch if the path folder has queues saved before.

参数

batch_size (int) – the batch size per GPU for computing.
epoch_queue_starts (int, optional) – from this epoch, starts to use the queue. Defaults to 15.
crops_for_assign (list[int], optional) – list of crops id used for computing assignments. Defaults to [0, 1].
feat_dim (int, optional) – feature dimension of output vector. Defaults to 128.
queue_length (int, optional) – length of the queue (0 for no queue). Defaults to 0.
interval (int, optional) – the interval to save the queue. Defaults to 1.

optimizer¶

class mmselfsup.core.optimizer.DefaultOptimizerConstructor(optimizer_cfg, paramwise_cfg=None)[源代码]¶

Rewrote default constructor for optimizers. By default each parameter share the same optimizer settings, and we provide an argument paramwise_cfg to specify parameter-wise settings. It is a dict and may contain the following fields: :param model: The model with parameters to be optimized. :type model: nn.Module :param optimizer_cfg: The config dict of the optimizer.

Positional fields are

type: class name of the optimizer.

Optional fields are

any arguments of the corresponding optimizer type, e.g., lr, weight_decay, momentum, etc.

参数: paramwise_cfg (dict, optional) – Parameter-wise options. Defaults to None.

Example 1:

>>> model = torch.nn.modules.Conv1d(1, 1, 1)
>>> optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
>>>                      weight_decay=0.0001)
>>> paramwise_cfg = dict('bias': dict(weight_decay=0.,                                  lars_exclude=True))
>>> optim_builder = DefaultOptimizerConstructor(
>>>     optimizer_cfg, paramwise_cfg)
>>> optimizer = optim_builder(model)

class mmselfsup.core.optimizer.LARS(params, lr=<required parameter>, momentum=0, weight_decay=0, dampening=0, eta=0.001, nesterov=False, eps=1e-08)[源代码]¶

Implements layer-wise adaptive rate scaling for SGD.

参数

params (iterable) – Iterable of parameters to optimize or dicts defining parameter groups.
lr (float) – Base learning rate.
momentum (float, optional) – Momentum factor. Defaults to 0 (‘m’)
weight_decay (float, optional) – Weight decay (L2 penalty). Defaults to 0. (‘beta’)
dampening (float, optional) – Dampening for momentum. Defaults to 0.
eta (float, optional) – LARS coefficient. Defaults to 0.001.
nesterov (bool, optional) – Enables Nesterov momentum. Defaults to False.
eps (float, optional) – A small number to avoid dviding zero. Defaults to 1e-8.

Based on Algorithm 1 of the following paper by You, Gitman, and Ginsburg. `Large Batch Training of Convolutional Networks:

<https://arxiv.org/abs/1708.03888>`_.

示例

>>> optimizer = LARS(model.parameters(), lr=0.1, momentum=0.9,
>>>                  weight_decay=1e-4, eta=1e-3)
>>> optimizer.zero_grad()
>>> loss_fn(model(input), target).backward()
>>> optimizer.step()

step(closure=None)[源代码]¶

Performs a single optimization step.

参数: closure (callable, optional) – A closure that reevaluates the model and returns the loss.

class mmselfsup.core.optimizer.TransformerFinetuneConstructor(optimizer_cfg, paramwise_cfg=None)[源代码]¶

Rewrote default constructor for optimizers.

By default each parameter share the same optimizer settings, and we provide an argument paramwise_cfg to specify parameter-wise settings. In addition, we provide two optional parameters, model_type and layer_decay to set the commonly used layer-wise learning rate decay schedule. Currently, we only support layer-wise learning rate schedule for swin and vit.

参数

optimizer_cfg (dict) –
The config dict of the optimizer. Positional fields are
- type: class name of the optimizer.
Optional fields are
- any arguments of the corresponding optimizer type, e.g., lr, weight_decay, momentum, model_type, layer_decay, etc.
paramwise_cfg (dict, optional) – Parameter-wise options. Defaults to None.

Example 1:

>>> model = torch.nn.modules.Conv1d(1, 1, 1)
>>> optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
>>>                      weight_decay=0.0001, model_type='vit')
>>> paramwise_cfg = dict('bias': dict(weight_decay=0.,                                  lars_exclude=True))
>>> optim_builder = TransformerFinetuneConstructor(
>>>     optimizer_cfg, paramwise_cfg)
>>> optimizer = optim_builder(model)

mmselfsup.core.optimizer.build_optimizer(model, optimizer_cfg)[源代码]¶

Build optimizer from configs.

参数

model (nn.Module) – The model with parameters to be optimized.
optimizer_cfg (dict) –
The config dict of the optimizer. Positional fields are:
- type: class name of the optimizer.
- lr: base learning rate.
Optional fields are:
- any arguments of the corresponding optimizer type, e.g., weight_decay, momentum, etc.
- paramwise_options: a dict with regular expression as keys to match parameter names and a dict containing options as values. Options include 6 fields: lr, lr_mult, momentum, momentum_mult, weight_decay, weight_decay_mult.

返回

The initialized optimizer.

返回类型

torch.optim.Optimizer

示例

>>> model = torch.nn.modules.Conv1d(1, 1, 1)
>>> paramwise_options = {
>>>     '(bn|gn)(\d+)?.(weight|bias)': dict(weight_decay_mult=0.1),
>>>     '\Ahead.': dict(lr_mult=10, momentum=0)}
>>> optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
>>>                      weight_decay=0.0001,
>>>                      paramwise_options=paramwise_options)
>>> optimizer = build_optimizer(model, optimizer_cfg)

mmselfsup.datasets¶

data_sources¶

class mmselfsup.datasets.data_sources.BaseDataSource(data_prefix, classes=None, ann_file=None, test_mode=False, color_type='color', channel_order='rgb', file_client_args={'backend': 'disk'})[源代码]¶

Datasource base class to load dataset information.

参数

data_prefix (str) – the prefix of data path.
classes (str | Sequence[str], optional) – Specify classes to load.
ann_file (str | None) – the annotation file. When ann_file is str, the subclass is expected to read from the ann_file. When ann_file is None, the subclass is expected to read according to data_prefix.
test_mode (bool) – in train mode or test mode. Defaults to False.
color_type (str) – The flag argument for mmcv.imfrombytes(). Defaults to color.
channel_order (str) – The channel order of images when loaded. Defaults to rgb.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend=’disk’).

get_cat_ids(idx)[源代码]¶

Get category id by index.

参数: idx (int) – Index of data.
返回: Image category of specified index.
返回类型: int

classmethod get_classes(classes=None)[源代码]¶

Get class names of current dataset.

参数: classes (Sequence[str] | str | None) – If classes is None, use default CLASSES defined by builtin dataset. If classes is a string, take it as a file name. The file contains the name of classes where each line contains one class name. If classes is a tuple or list, override the CLASSES defined by the dataset.
返回: Names of categories of the dataset.
返回类型: tuple[str] or list[str]

get_gt_labels()[源代码]¶

Get all ground-truth labels (categories).

返回: categories for all images.
返回类型: list[int]

get_img(idx)[源代码]¶

Get image by index.

参数: idx (int) – Index of data.
返回: PIL Image format.
返回类型: Image

class mmselfsup.datasets.data_sources.CIFAR10(data_prefix, classes=None, ann_file=None, test_mode=False, color_type='color', channel_order='rgb', file_client_args={'backend': 'disk'})[源代码]¶

CIFAR10 Dataset.

This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py

class mmselfsup.datasets.data_sources.CIFAR100(data_prefix, classes=None, ann_file=None, test_mode=False, color_type='color', channel_order='rgb', file_client_args={'backend': 'disk'})[源代码]¶: CIFAR100 Dataset.

class mmselfsup.datasets.data_sources.ImageList(data_prefix, classes=None, ann_file=None, test_mode=False, color_type='color', channel_order='rgb', file_client_args={'backend': 'disk'})[源代码]¶

The implementation for loading any image list file.

The ImageList can load an annotation file or a list of files and merge all data records to one list. If data is unlabeled, the gt_label will be set -1.

class mmselfsup.datasets.data_sources.ImageNet(data_prefix, classes=None, ann_file=None, test_mode=False, color_type='color', channel_order='rgb', file_client_args={'backend': 'disk'})[源代码]¶

ImageNet Dataset.

This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/imagenet.py

class mmselfsup.datasets.data_sources.ImageNet21k(data_prefix, classes=None, ann_file=None, multi_label=False, recursion_subdir=False, test_mode=False)[源代码]¶

ImageNet21k Dataset. Since the dataset ImageNet21k is extremely big, cantains 21k+ classes and 1.4B files. This class has improved the following points on the basis of the class ImageNet, in order to save memory usage and time required :

Delete the samples attribute

using ‘slots’ create a Data_item tp replace dict

Modify setting info dict from function load_annotations to function prepare_data

using int instead of np.array(…, np.int64)

参数

data_prefix (str) – the prefix of data path
ann_file (str | None) – the annotation file. When ann_file is str, the subclass is expected to read from the ann_file. When ann_file is None, the subclass is expected to read according to data_prefix
test_mode (bool) – in train mode or test mode
multi_label (bool) – use multi label or not.
recursion_subdir (bool) – whether to use sub-directory pictures, which are meet the conditions in the folder under category directory.

load_annotations()[源代码]¶: load dataset annotations.

pipelines¶

class mmselfsup.datasets.pipelines.BEiTMaskGenerator(input_size: int, num_masking_patches: int, min_num_patches: int = 4, max_num_patches: Optional[int] = None, min_aspect: float = 0.3, max_aspect: Optional[float] = None)[源代码]¶

Generate mask for image.

This module is borrowed from https://github.com/microsoft/unilm/tree/master/beit

参数

input_size (int) – The size of input image.
num_masking_patches (int) – The number of patches to be masked.
min_num_patches (int) – The minimum number of patches to be masked in the process of generating mask. Defaults to 4.
max_num_patches (int, optional) – The maximum number of patches to be masked in the process of generating mask. Defaults to None.
min_aspect (float, optional) – The minimum aspect ratio of mask blocks. Defaults to 0.3.
min_aspect – The minimum aspect ratio of mask blocks. Defaults to None.

class mmselfsup.datasets.pipelines.GaussianBlur(sigma_min, sigma_max, p=0.5)[源代码]¶

GaussianBlur augmentation refers to `SimCLR.

<https://arxiv.org/abs/2002.05709>`_.

参数

sigma_min (float) – The minimum parameter of Gaussian kernel std.
sigma_max (float) – The maximum parameter of Gaussian kernel std.
p (float, optional) – Probability. Defaults to 0.5.

class mmselfsup.datasets.pipelines.Lighting(alphastd=0.1)[源代码]¶

Lighting noise(AlexNet - style PCA - based noise).

参数: alphastd (float, optional) – The parameter for Lighting. Defaults to 0.1.

class mmselfsup.datasets.pipelines.MaskFeatMaskGenerator(mask_window_size: int = 14, mask_ratio: float = 0.4, min_num_patches: int = 15, max_num_patches: Optional[int] = None, min_aspect: float = 0.3, max_aspect: Optional[float] = None)[源代码]¶

Generate random block mask for each image.

This module is borrowed from https://github.com/facebookresearch/SlowFast/blob/main/slowfast/datasets/transform.py :param mask_window_size: Size of input image. Defaults to 14. :type mask_window_size: int :param mask_ratio: The mask ratio of image. Defaults to 0.4. :type mask_ratio: float :param min_num_patches: Minimum number of patches that require masking.

Defaults to 15.

参数

max_num_patches (int, optional) – Maximum number of patches that require masking. Defaults to None.
min_aspect (int) – Minimum aspect of patches. Defaults to 0.3.
max_aspect (float, optional) – Maximum aspect of patches. Defaults to None.

class mmselfsup.datasets.pipelines.RandomAppliedTrans(transforms, p=0.5)[源代码]¶

Randomly applied transformations.

参数

transforms (list[dict]) – List of transformations in dictionaries.
p (float, optional) – Probability. Defaults to 0.5.

class mmselfsup.datasets.pipelines.RandomAug(input_size=None, color_jitter=None, auto_augment=None, interpolation=None, re_prob=None, re_mode=None, re_count=None, mean=None, std=None)[源代码]¶

RandAugment data augmentation method based on “RandAugment: Practical automated data augmentation with a reduced search space”.

This code is borrowed from <https://github.com/pengzhiliang/MAE-pytorch>

class mmselfsup.datasets.pipelines.SimMIMMaskGenerator(input_size: int = 192, mask_patch_size: int = 32, model_patch_size: int = 4, mask_ratio: float = 0.6)[源代码]¶

Generate random block mask for each Image.

This module is used in SimMIM to generate masks.

参数

input_size (int) – Size of input image. Defaults to 192.
mask_patch_size (int) – Size of each block mask. Defaults to 32.
model_patch_size (int) – Patch size of each token. Defaults to 4.
mask_ratio (float) – The mask ratio of image. Defaults to 0.6.

class mmselfsup.datasets.pipelines.Solarization(threshold=128, p=0.5)[源代码]¶

Solarization augmentation refers to `BYOL.

<https://arxiv.org/abs/2006.07733>`_.

参数

threshold (float, optional) – The solarization threshold. Defaults to 128.
p (float, optional) – Probability. Defaults to 0.5.

class mmselfsup.datasets.pipelines.ToTensor[源代码]¶

Convert image or a sequence of images to tensor.

This module can not only convert a single image to tensor, but also a sequence of images.

samplers¶

class mmselfsup.datasets.samplers.DistributedGivenIterationSampler(dataset, total_iter, batch_size, num_replicas=None, rank=None, last_iter=- 1)[源代码]¶

gen_new_list()[源代码]¶: Each process shuffle all list with same seed, and pick one piece according to rank.

class mmselfsup.datasets.samplers.DistributedGroupSampler(dataset, samples_per_gpu=1, num_replicas=None, rank=None)[源代码]¶

Sampler that restricts data loading to a subset of the dataset.

It is especially useful in conjunction with torch.nn.parallel.DistributedDataParallel. In such case, each process can pass a DistributedSampler instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it.

注解

Dataset is assumed to be of constant size.

参数

dataset – Dataset used for sampling.
num_replicas (optional) – Number of processes participating in distributed training.
rank (optional) – Rank of the current process within num_replicas.

class mmselfsup.datasets.samplers.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, replace=False, seed=0)[源代码]¶

class mmselfsup.datasets.samplers.GroupSampler(dataset, samples_per_gpu=1)[源代码]¶

datasets¶

class mmselfsup.datasets.BaseDataset(data_source, pipeline, prefetch=False)[源代码]¶

Base dataset class.

The base dataset can be inherited by different algorithm’s datasets. After __init__, the data source and pipeline will be built. Besides, the algorithm specific dataset implements different operations after obtaining images from data sources.

参数

data_source (dict) – Data source defined in mmselfsup.datasets.data_sources.
pipeline (list[dict]) – A list of dict, where each element represents an operation defined in mmselfsup.datasets.pipelines.
prefetch (bool, optional) – Whether to prefetch data. Defaults to False.

class mmselfsup.datasets.ConcatDataset(datasets)[源代码]¶

A wrapper of concatenated dataset.

Same as torch.utils.data.dataset.ConcatDataset, but concat the group flag for image aspect ratio.

参数: datasets (list[Dataset]) – A list of datasets.

class mmselfsup.datasets.DeepClusterDataset(data_source, pipeline, prefetch=False)[源代码]¶

Dataset for DC and ODC.

The dataset initializes clustering labels and assigns it during training.

参数

data_source (dict) – Data source defined in mmselfsup.datasets.data_sources.
pipeline (list[dict]) – A list of dict, where each element represents an operation defined in mmselfsup.datasets.pipelines.
prefetch (bool, optional) – Whether to prefetch data. Defaults to False.

class mmselfsup.datasets.MultiViewDataset(data_source, num_views, pipelines, prefetch=False)[源代码]¶

The dataset outputs multiple views of an image.

The number of views in the output dict depends on num_views. The image can be processed by one pipeline or multiple piepelines.

参数

data_source (dict) – Data source defined in mmselfsup.datasets.data_sources.
num_views (list) – The number of different views.
pipelines (list[list[dict]]) – A list of pipelines, where each pipeline contains elements that represents an operation defined in mmselfsup.datasets.pipelines.
prefetch (bool, optional) – Whether to prefetch data. Defaults to False.

实际案例

>>> dataset = MultiViewDataset(data_source, [2], [pipeline])
>>> output = dataset[idx]
The output got 2 views processed by one pipeline.

>>> dataset = MultiViewDataset(
>>>     data_source, [2, 6], [pipeline1, pipeline2])
>>> output = dataset[idx]
The output got 8 views processed by two pipelines, the first two views
were processed by pipeline1 and the remaining views by pipeline2.

class mmselfsup.datasets.RelativeLocDataset(data_source, pipeline, format_pipeline, prefetch=False)[源代码]¶

Dataset for relative patch location.

The dataset crops image into several patches and concatenates every surrounding patch with center one. Finally it also outputs corresponding labels 0, 1, 2, 3, 4, 5, 6, 7.

参数

data_source (dict) – Data source defined in mmselfsup.datasets.data_sources.
pipeline (list[dict]) – A list of dict, where each element represents an operation defined in mmselfsup.datasets.pipelines.
format_pipeline (list[dict]) – A list of dict, it converts input format from PIL.Image to Tensor. The operation is defined in mmselfsup.datasets.pipelines.
prefetch (bool, optional) – Whether to prefetch data. Defaults to False.

class mmselfsup.datasets.RepeatDataset(dataset, times)[源代码]¶

A wrapper of repeated dataset.

The length of repeated dataset will be times larger than the original dataset. This is useful when the data loading time is long but the dataset is small. Using RepeatDataset can reduce the data loading time between epochs.

参数

dataset (Dataset) – The dataset to be repeated.
times (int) – Repeat times.

class mmselfsup.datasets.RotationPredDataset(data_source, pipeline, prefetch=False)[源代码]¶

Dataset for rotation prediction.

The dataset rotates the image with 0, 90, 180, and 270 degrees and outputs labels 0, 1, 2, 3 correspodingly.

参数

data_source (dict) – Data source defined in mmselfsup.datasets.data_sources.
pipeline (list[dict]) – A list of dict, where each element represents an operation defined in mmselfsup.datasets.pipelines.
prefetch (bool, optional) – Whether to prefetch data. Defaults to False.

class mmselfsup.datasets.SingleViewDataset(data_source, pipeline, prefetch=False)[源代码]¶

The dataset outputs one view of an image, containing some other information such as label, idx, etc.

参数

data_source (dict) – Data source defined in mmselfsup.datasets.data_sources.
pipeline (list[dict]) – A list of dict, where each element represents an operation defined in mmselfsup.datasets.pipelines.
prefetch (bool, optional) – Whether to prefetch data. Defaults to False.

evaluate(results, logger=None, topk=(1, 5))[源代码]¶

The evaluation function to output accuracy.

参数

results (dict) – The key-value pair is the output head name and corresponding prediction values.
logger (logging.Logger | str | None, optional) – The defined logger to be used. Defaults to None.
topk (tuple(int)) – The output includes topk accuracy.

mmselfsup.datasets.build_dataloader(dataset, imgs_per_gpu=None, samples_per_gpu=None, workers_per_gpu=1, num_gpus=1, dist=True, shuffle=True, replace=False, seed=None, pin_memory=True, persistent_workers=True, **kwargs)[源代码]¶

Build PyTorch DataLoader.

In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.

参数

dataset (Dataset) – A PyTorch dataset.
imgs_per_gpu (int) – (Deprecated, please use samples_per_gpu) Number of images on each GPU, i.e., batch size of each GPU. Defaults to None.
samples_per_gpu (int) – Number of images on each GPU, i.e., batch size of each GPU. Defaults to None.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU. persistent_workers option needs num_workers > 0. Defaults to 1.
num_gpus (int) – Number of GPUs. Only used in non-distributed training.
dist (bool) – Distributed training/test or not. Defaults to True.
shuffle (bool) – Whether to shuffle the data at every epoch. Defaults to True.
replace (bool) – Replace or not in random shuffle. It works on when shuffle is True. Defaults to False.
seed (int) – set seed for dataloader.
pin_memory (bool, optional) – If True, the data loader will copy Tensors into CUDA pinned memory before returning them. Defaults to True.
persistent_workers (bool) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. The argument also has effect in PyTorch>=1.7.0. Defaults to True.
kwargs – any keyword argument to be used to initialize DataLoader

返回

A PyTorch dataloader.

返回类型

DataLoader

mmselfsup.models¶

algorithms¶

class mmselfsup.models.algorithms.BYOL(backbone, neck=None, head=None, base_momentum=0.996, init_cfg=None, **kwargs)[源代码]¶

BYOL.

Implementation of Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning. The momentum adjustment is in core/hooks/byol_hook.py.

参数

backbone (dict) – Config dict for module of backbone.
neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.
head (dict) – Config dict for module of loss functions. Defaults to None.
base_momentum (float) – The base momentum coefficient for the target network. Defaults to 0.996.

extract_feat(img)[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward_train(img, **kwargs)[源代码]¶

Forward computation during training.

参数: img (list[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: A dictionary of loss components.
返回类型: dict[str, Tensor]

momentum_update()[源代码]¶: Momentum update of the target network.

class mmselfsup.models.algorithms.BarlowTwins(backbone: Optional[dict] = None, neck: Optional[dict] = None, head: Optional[dict] = None, init_cfg: Optional[dict] = None, **kwargs)[源代码]¶

BarlowTwins.

Implementation of Barlow Twins: Self-Supervised Learning via Redundancy Reduction. Part of the code is borrowed from: https://github.com/facebookresearch/barlowtwins/blob/main/main.py.

参数

backbone (dict) – Config dict for module of backbone. Defaults to None.
neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.
head (dict) – Config dict for module of loss functions. Defaults to None.
init_cfg (dict) – Config dict for weight initialization. Defaults to None.

extract_feat(img: torch.Tensor) → torch.Tensor[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward_train(img: List[torch.Tensor]) → dict[源代码]¶

Forward computation during training.

参数: img (List[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: A dictionary of loss components
返回类型: dict[str, Tensor]

class mmselfsup.models.algorithms.BaseModel(init_cfg=None)[源代码]¶

Base model class for self-supervised learning.

abstract extract_feat(imgs)[源代码]¶

Function to extract features from backbone.

参数

img (Tensor) – Input images. Typically these should be mean centered
std scaled. (and) –

forward(img, mode='train', **kwargs)[源代码]¶

Forward function of model.

Calls either forward_train, forward_test or extract_feat function according to the mode.

forward_test(imgs, **kwargs)[源代码]¶

参数

img (Tensor) – List of tensors. Typically these should be mean centered and std scaled.
kwargs (keyword arguments) – Specific to concrete implementation.

abstract forward_train(imgs, **kwargs)[源代码]¶

参数

img ([Tensor) – List of tensors. Typically these should be mean centered and std scaled.
kwargs (keyword arguments) – Specific to concrete implementation.

train_step(data, optimizer)[源代码]¶

The iteration step during training.

This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating are also defined in this method, such as GAN.

参数

data (dict) – The output of dataloader.
optimizer (torch.optim.Optimizer | dict) – The optimizer of runner is passed to train_step(). This argument is unused and reserved.

返回

Dict of outputs. The following fields are contained.

loss (torch.Tensor): A tensor for back propagation, which can be a weighted sum of multiple losses.
log_vars (dict): Dict contains all the variables to be sent to the logger.
num_samples (int): Indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.

返回类型

dict

val_step(data, optimizer)[源代码]¶

The iteration step during validation.

This method shares the same signature as train_step(), but used during val epochs. Note that the evaluation after training epochs is not implemented with this method, but an evaluation hook.

class mmselfsup.models.algorithms.CAE(backbone: Optional[dict] = None, neck: Optional[dict] = None, head: Optional[dict] = None, base_momentum: float = 0.0, init_cfg: Optional[dict] = None, **kwargs)[源代码]¶

CAE.

Implementation of Context Autoencoder for Self-Supervised Representation Learning.

参数

backbone (dict, optional) – Config dict for module of backbone.
neck (dict, optional) – Config dict for module of deep features to compact feature vectors. Defaults to None.
head (dict, optional) – Config dict for module of loss functions. Defaults to None.
base_momentum (float) – The base momentum coefficient for the target network. Defaults to 0.0.
init_cfg (dict, optional) – the config to control the initialization.

extract_feat(img: torch.Tensor, mask: torch.Tensor) → torch.Tensor[源代码]¶

Function to extract features from backbone.

参数

img (Tensor) – Input images. Typically these should be mean centered
std scaled. (and) –

forward_train(samples: Sequence, **kwargs) → dict[源代码]¶

Args: img ([Tensor): List of tensors. Typically these should be

mean centered and std scaled.

kwargs (keyword arguments): Specific to concrete implementation.

init_weights() → None[源代码]¶: Initialize the weights.

momentum_update() → None[源代码]¶: Momentum update of the teacher network.

class mmselfsup.models.algorithms.Classification(backbone, with_sobel=False, head=None, train_cfg=None, init_cfg=None)[源代码]¶

Simple image classification.

参数

backbone (dict) – Config dict for module of backbone.
with_sobel (bool) – Whether to apply a Sobel filter. Defaults to False.
head (dict) – Config dict for module of loss functions. Defaults to None.

extract_feat(img)[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward_test(img, **kwargs)[源代码]¶

Forward computation during test.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: A dictionary of output features.
返回类型: dict[str, Tensor]

forward_train(img, label, **kwargs)[源代码]¶

Forward computation during training.

参数

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
label (Tensor) – Ground-truth labels.
kwargs – Any keyword arguments to be used to forward.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

class mmselfsup.models.algorithms.DeepCluster(backbone, with_sobel=True, neck=None, head=None, init_cfg=None)[源代码]¶

DeepCluster.

Implementation of Deep Clustering for Unsupervised Learning of Visual Features. The clustering operation is in core/hooks/deepcluster_hook.py.

参数

backbone (dict) – Config dict for module of backbone.
with_sobel (bool) – Whether to apply a Sobel filter on images. Defaults to True.
neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.
head (dict) – Config dict for module of loss functions. Defaults to None.

extract_feat(img)[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward_test(img, **kwargs)[源代码]¶

Forward computation during test.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: A dictionary of output features.
返回类型: dict[str, Tensor]

forward_train(img, pseudo_label, **kwargs)[源代码]¶

Forward computation during training.

参数

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
pseudo_label (Tensor) – Label assignments.
kwargs – Any keyword arguments to be used to forward.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

set_reweight(labels, reweight_pow=0.5)[源代码]¶

Loss re-weighting.

Re-weighting the loss according to the number of samples in each class.

参数

labels (numpy.ndarray) – Label assignments.
reweight_pow (float) – The power of re-weighting. Defaults to 0.5.

class mmselfsup.models.algorithms.DenseCL(backbone, neck=None, head=None, queue_len=65536, feat_dim=128, momentum=0.999, loss_lambda=0.5, init_cfg=None, **kwargs)[源代码]¶

DenseCL.

Implementation of Dense Contrastive Learning for Self-Supervised Visual Pre-Training. Borrowed from the authors’ code: https://github.com/WXinlong/DenseCL. The loss_lambda warmup is in core/hooks/densecl_hook.py.

参数

backbone (dict) – Config dict for module of backbone.
neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.
head (dict) – Config dict for module of loss functions. Defaults to None.
queue_len (int) – Number of negative keys maintained in the queue. Defaults to 65536.
feat_dim (int) – Dimension of compact feature vectors. Defaults to 128.
momentum (float) – Momentum coefficient for the momentum-updated encoder. Defaults to 0.999.
loss_lambda (float) – Loss weight for the single and dense contrastive loss. Defaults to 0.5.

extract_feat(img)[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward_test(img, **kwargs)[源代码]¶

Forward computation during test.

参数: img (Tensor) – Input of two concatenated images of shape (N, 2, C, H, W). Typically these should be mean centered and std scaled.
返回: A dictionary of normalized output features.
返回类型: dict(Tensor)

forward_train(img, **kwargs)[源代码]¶

Forward computation during training.

参数: img (list[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: A dictionary of loss components.
返回类型: dict[str, Tensor]

init_weights()[源代码]¶: Init weights and copy query encoder init weights to key encoder.

class mmselfsup.models.algorithms.InterCLRMoCo(backbone, neck=None, head=None, queue_len=65536, feat_dim=128, momentum=0.999, memory_bank=None, online_labels=True, neg_num=16384, neg_sampling='semihard', semihard_neg_pool_num=128000, semieasy_neg_pool_num=128000, intra_cos_marign_loss=False, intra_cos_margin=0, intra_arc_marign_loss=False, intra_arc_margin=0, inter_cos_marign_loss=True, inter_cos_margin=- 0.5, inter_arc_marign_loss=False, inter_arc_margin=0, intra_loss_weight=0.75, inter_loss_weight=0.25, share_neck=True, num_classes=10000, init_cfg=None, **kwargs)[源代码]¶

MoCo-InterCLR.

Official implementation of Delving into Inter-Image Invariance for Unsupervised Visual Representations. The clustering operation is in core/hooks/interclr_hook.py.

参数

backbone (dict) – Config dict for module of backbone.
neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.
head (dict) – Config dict for module of loss functions. Defaults to None.
queue_len (int) – Number of negative keys maintained in the queue. Defaults to 65536.
feat_dim (int) – Dimension of compact feature vectors. Defaults to 128.
momentum (float) – Momentum coefficient for the momentum-updated encoder. Defaults to 0.999.
memory_bank (dict) – Config dict for module of memory banks. Defaults to None.
online_labels (bool) – Whether to use online labels. Defaults to True.
neg_num (int) – Number of negative samples for inter-image branch. Defaults to 16384.
neg_sampling (str) – Negative sampling strategy. Support ‘hard’, ‘semihard’, ‘random’, ‘semieasy’. Defaults to ‘semihard’.
semihard_neg_pool_num (int) – Number of negative samples for semi-hard nearest neighbor pool. Defaults to 128000.
semieasy_neg_pool_num (int) – Number of negative samples for semi-easy nearest neighbor pool. Defaults to 128000.
intra_cos_marign_loss (bool) – Whether to use a cosine margin for intra-image branch. Defaults to False.
intra_cos_marign (float) – Intra-image cosine margin. Defaults to 0.
intra_arc_marign_loss (bool) – Whether to use an arc margin for intra-image branch. Defaults to False.
intra_arc_marign (float) – Intra-image arc margin. Defaults to 0.
inter_cos_marign_loss (bool) – Whether to use a cosine margin for inter-image branch. Defaults to True.
inter_cos_marign (float) – Inter-image cosine margin. Defaults to -0.5.
inter_arc_marign_loss (bool) – Whether to use an arc margin for inter-image branch. Defaults to False.
inter_arc_marign (float) – Inter-image arc margin. Defaults to 0.
intra_loss_weight (float) – Loss weight for intra-image branch. Defaults to 0.75.
inter_loss_weight (float) – Loss weight for inter-image branch. Defaults to 0.25.
share_neck (bool) – Whether to share the neck for intra- and inter-image branches. Defaults to True.
num_classes (int) – Number of clusters. Defaults to 10000.

contrast_inter(q, idx)[源代码]¶

Inter-image invariance learning.

参数

q (Tensor) – Query features with shape (N, C).
idx (Tensor) – Index corresponding to each query.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

contrast_intra(q, k)[源代码]¶

Intra-image invariance learning.

参数

q (Tensor) – Query features with shape (N, C).
k (Tensor) – Key features with shape (N, C).

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

extract_feat(img)[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward_train(img, idx, **kwargs)[源代码]¶

Forward computation during training.

参数

img (list[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.
idx (Tensor) – Index corresponding to each image.
kwargs – Any keyword arguments to be used to forward.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

init_weights()[源代码]¶: Initialize base_encoder with init_cfg defined in backbone.

class mmselfsup.models.algorithms.MAE(backbone: dict, neck: dict, head: dict, init_cfg: Optional[dict] = None)[源代码]¶

MAE.

Implementation of Masked Autoencoders Are Scalable Vision Learners.

参数

backbone (dict) – Config dict for encoder. Defaults to None.
neck (dict) – Config dict for encoder. Defaults to None.
head (dict) – Config dict for loss functions. Defaults to None.
init_cfg (dict, optional) – Config dict for weight initialization. Defaults to None.

extract_feat(img: torch.Tensor) → Tuple[torch.Tensor][源代码]¶

Function to extract features from backbone.

参数: img (torch.Tensor) – Input images of shape (N, C, H, W).
返回: backbone outputs.
返回类型: Tuple[torch.Tensor]

forward_test(img: torch.Tensor, **kwargs) → Tuple[torch.Tensor, torch.Tensor][源代码]¶

Forward computation during testing.

参数

img (torch.Tensor) – Input images of shape (N, C, H, W).
kwargs – Any keyword arguments to be used to forward.

返回

Output of model test.

mask: Mask used to mask image.
pred: The output of neck.

返回类型

Tuple[torch.Tensor, torch.Tensor]

forward_train(img: torch.Tensor, **kwargs) → Dict[str, torch.Tensor][源代码]¶

Forward computation during training.

参数

img (torch.Tensor) – Input images of shape (N, C, H, W).
kwargs – Any keyword arguments to be used to forward.

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

init_weights()[源代码]¶: Initialize the weights.

class mmselfsup.models.algorithms.MMClsImageClassifierWrapper(backbone: dict, neck: Optional[dict] = None, head: Optional[dict] = None, pretrained: Optional[str] = None, train_cfg: Optional[dict] = None, init_cfg: Optional[dict] = None)[源代码]¶

Workaround to use models from mmclassificaiton.

Since the output of classifier from mmclassification is not compatible with mmselfsup’s evaluation function. We rewrite some key components from mmclassification.

参数

backbone (dict) – Config dict for module of backbone.
neck (dict, optional) – Config dict for module of neck. Defaults to None.
head (dict, optional) – Config dict for module of loss functions. Defaults to None.
pretrained (str, optional) – The path of pre-trained checkpoint. Defaults to None.
train_cfg (dict, optional) – Config dict for pre-processing utils, e.g. mixup. Defaults to None.
init_cfg (dict, optional) – Config dict for initialization. Defaults to None.

forward(img, mode='train', **kwargs)[源代码]¶

Forward function of model.

Calls either forward_train, forward_test or extract_feat function according to the mode.

forward_test(imgs, **kwargs)[源代码]¶

参数: imgs (List[Tensor]) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch.

forward_train(img, label, **kwargs)[源代码]¶

Forward computation during training.

参数

img (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
label (Tensor) – It should be of shape (N, 1) encoding the ground-truth label of input images for single label task. It shoulf be of shape (N, C) encoding the ground-truth label of input images for multi-labels task.

返回

a dictionary of loss components

返回类型

dict[str, Tensor]

class mmselfsup.models.algorithms.MaskFeat(backbone: dict, head: dict, hog_para: dict, init_cfg: Optional[dict] = None)[源代码]¶

MaskFeat.

Implementation of Masked Feature Prediction for Self-Supervised Visual Pre-Training. :param backbone: Config dict for encoder. :type backbone: dict :param head: Config dict for loss functions. :type head: dict :param hog_para: Config dict for hog layer.

dict[‘nbins’, int]: Number of bin. Defaults to 9. dict[‘pool’, float]: Number of cell. Defaults to 8. dict[‘gaussian_window’, int]: Size of gaussian kernel.

Defaults to 16.

参数: init_cfg (dict) – Config dict for weight initialization. Defaults to None.

extract_feat(input: List[torch.Tensor]) → torch.Tensor[源代码]¶

Function to extract features from backbone.

参数: input (List[torch.Tensor, torch.Tensor]) – Input images and masks.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward_train(input: List[torch.Tensor], **kwargs) → dict[源代码]¶

Forward computation during training.

参数

input (List[torch.Tensor, torch.Tensor]) – Input images and masks.
kwargs – Any keyword arguments to be used to forward.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

class mmselfsup.models.algorithms.MoCo(backbone, neck=None, head=None, queue_len=65536, feat_dim=128, momentum=0.999, init_cfg=None, **kwargs)[源代码]¶

MoCo.

Implementation of Momentum Contrast for Unsupervised Visual Representation Learning. Part of the code is borrowed from: https://github.com/facebookresearch/moco/blob/master/moco/builder.py.

参数

backbone (dict) – Config dict for module of backbone.
neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.
head (dict) – Config dict for module of loss functions. Defaults to None.
queue_len (int) – Number of negative keys maintained in the queue. Defaults to 65536.
feat_dim (int) – Dimension of compact feature vectors. Defaults to 128.
momentum (float) – Momentum coefficient for the momentum-updated encoder. Defaults to 0.999.

extract_feat(img)[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward_train(img, **kwargs)[源代码]¶

Forward computation during training.

参数: img (list[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: A dictionary of loss components.
返回类型: dict[str, Tensor]

class mmselfsup.models.algorithms.MoCoV3(backbone, neck, head, base_momentum=0.99, init_cfg=None, **kwargs)[源代码]¶

MoCo v3.

Implementation of An Empirical Study of Training Self-Supervised Vision Transformers.

参数

backbone (dict) – Config dict for module of backbone.
neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.
head (dict) – Config dict for module of loss functions. Defaults to None.
base_momentum (float) – Momentum coefficient for the momentum-updated encoder. Defaults to 0.99.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None

extract_feat(img)[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images. Typically these should be mean centered and std scaled.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward_train(img, **kwargs)[源代码]¶

Forward computation during training.

参数: img (list[Tensor]) – A list of input images. Typically these should be mean centered and std scaled.
返回: A dictionary of loss components.
返回类型: dict[str, Tensor]

init_weights()[源代码]¶: Initialize base_encoder with init_cfg defined in backbone.

momentum_update()[源代码]¶: Momentum update of the momentum encoder.

class mmselfsup.models.algorithms.NPID(backbone, neck=None, head=None, memory_bank=None, neg_num=65536, ensure_neg=False, init_cfg=None)[源代码]¶

NPID.

Implementation of Unsupervised Feature Learning via Non-parametric Instance Discrimination.

参数

backbone (dict) – Config dict for module of backbone.
neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.
head (dict) – Config dict for module of loss functions. Defaults to None.
memory_bank (dict) – Config dict for module of memory banks. Defaults to None.
neg_num (int) – Number of negative samples for each image. Defaults to 65536.
ensure_neg (bool) – If False, there is a small probability that negative samples contain positive ones. Defaults to False.

extract_feat(img)[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward_train(img, idx, **kwargs)[源代码]¶

Forward computation during training.

参数

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
idx (Tensor) – Index corresponding to each image.
kwargs – Any keyword arguments to be used to forward.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

class mmselfsup.models.algorithms.ODC(backbone, with_sobel=False, neck=None, head=None, memory_bank=None, init_cfg=None)[源代码]¶

ODC.

Official implementation of Online Deep Clustering for Unsupervised Representation Learning. The operation w.r.t. memory bank and loss re-weighting is in

core/hooks/odc_hook.py.

参数

backbone (dict) – Config dict for module of backbone.
with_sobel (bool) – Whether to apply a Sobel filter on images. Defaults to False.
neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.
head (dict) – Config dict for module of loss functions. Defaults to None.
memory_bank (dict) – Module of memory banks. Defaults to None.

extract_feat(img)[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward_test(img, **kwargs)[源代码]¶

Forward computation during test.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: A dictionary of output features.
返回类型: dict[str, Tensor]

forward_train(img, idx, **kwargs)[源代码]¶

Forward computation during training.

参数

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
idx (Tensor) – Index corresponding to each image.
kwargs – Any keyword arguments to be used to forward.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

set_reweight(labels=None, reweight_pow=0.5)[源代码]¶

Loss re-weighting.

Re-weighting the loss according to the number of samples in each class.

参数

labels (numpy.ndarray) – Label assignments. Defaults to None.
reweight_pow (float) – The power of re-weighting. Defaults to 0.5.

class mmselfsup.models.algorithms.RelativeLoc(backbone, neck=None, head=None, init_cfg=None)[源代码]¶

Relative patch location.

Implementation of Unsupervised Visual Representation Learning by Context Prediction.

参数

backbone (dict) – Config dict for module of backbone.
neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.
head (dict) – Config dict for module of loss functions. Defaults to None.

extract_feat(img)[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward(img, patch_label=None, mode='train', **kwargs)[源代码]¶

Forward function to select mode and modify the input image shape.

参数: img (Tensor) – Input images, the shape depends on mode. Typically these should be mean centered and std scaled.

forward_test(img, **kwargs)[源代码]¶

Forward computation during training.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: A dictionary of output features.
返回类型: dict[str, Tensor]

forward_train(img, patch_label, **kwargs)[源代码]¶

Forward computation during training.

参数

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
patch_label (Tensor) – Labels for the relative patch locations.
kwargs – Any keyword arguments to be used to forward.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

class mmselfsup.models.algorithms.RotationPred(backbone, head=None, init_cfg=None)[源代码]¶

Rotation prediction.

Implementation of Unsupervised Representation Learning by Predicting Image Rotations.

参数

backbone (dict) – Config dict for module of backbone.
head (dict) – Config dict for module of loss functions. Defaults to None.

extract_feat(img)[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward(img, rot_label=None, mode='train', **kwargs)[源代码]¶

Forward function to select mode and modify the input image shape.

参数: img (Tensor) – Input images, the shape depends on mode. Typically these should be mean centered and std scaled.

forward_test(img, **kwargs)[源代码]¶

Forward computation during training.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: A dictionary of output features.
返回类型: dict[str, Tensor]

forward_train(img, rot_label, **kwargs)[源代码]¶

Forward computation during training.

参数

img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
rot_label (Tensor) – Labels for the rotations.
kwargs – Any keyword arguments to be used to forward.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

class mmselfsup.models.algorithms.SimCLR(backbone, neck=None, head=None, init_cfg=None)[源代码]¶

SimCLR.

Implementation of A Simple Framework for Contrastive Learning of Visual Representations.

参数

backbone (dict) – Config dict for module of backbone.
neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.
head (dict) – Config dict for module of loss functions. Defaults to None.

extract_feat(img)[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward_train(img, **kwargs)[源代码]¶

Forward computation during training.

参数: img (list[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: A dictionary of loss components.
返回类型: dict[str, Tensor]

class mmselfsup.models.algorithms.SimMIM(backbone: dict, neck: dict, head: dict, init_cfg: Optional[dict] = None)[源代码]¶

SimMIM.

Implementation of SimMIM: A Simple Framework for Masked Image Modeling.

参数

backbone (dict) – Config dict for encoder. Defaults to None.
neck (dict) – Config dict for encoder. Defaults to None.
head (dict) – Config dict for loss functions. Defaults to None.
init_cfg (dict, optional) – Config dict for weight initialization. Defaults to None.

extract_feat(img: torch.Tensor) → tuple[源代码]¶

Function to extract features from backbone.

参数: img (torch.Tensor) – Input images of shape (N, C, H, W).
返回: Latent representations of images.
返回类型: tuple[Tensor]

forward_train(x: List[torch.Tensor], **kwargs) → dict[源代码]¶

Forward the masked image and get the reconstruction loss.

参数: x (List[torch.Tensor, torch.Tensor]) – Images and masks.
返回: Reconstructed loss.
返回类型: dict

class mmselfsup.models.algorithms.SimSiam(backbone, neck=None, head=None, init_cfg=None, **kwargs)[源代码]¶

SimSiam.

Implementation of Exploring Simple Siamese Representation Learning. The operation of fixing learning rate of predictor is in core/hooks/simsiam_hook.py.

参数

backbone (dict) – Config dict for module of backbone.
neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.
head (dict) – Config dict for module of loss functions. Defaults to None.

extract_feat(img)[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: backbone outputs.
返回类型: tuple[Tensor]

forward_train(img)[源代码]¶

Forward computation during training.

参数: img (list[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: A dictionary of loss components
返回类型: loss[str, Tensor]

class mmselfsup.models.algorithms.SwAV(backbone, neck=None, head=None, init_cfg=None, **kwargs)[源代码]¶

SwAV.

Implementation of Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. The queue is built in core/hooks/swav_hook.py.

参数

backbone (dict) – Config dict for module of backbone.
neck (dict) – Config dict for module of deep features to compact feature vectors. Defaults to None.
head (dict) – Config dict for module of loss functions. Defaults to None.

extract_feat(img)[源代码]¶

Function to extract features from backbone.

参数: img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: Backbone outputs.
返回类型: tuple[Tensor]

forward_train(img, **kwargs)[源代码]¶

Forward computation during training.

参数: img (list[Tensor]) – A list of input images with shape (N, C, H, W). Typically these should be mean centered and std scaled.
返回: A dictionary of loss components.
返回类型: dict[str, Tensor]

backbones¶

class mmselfsup.models.backbones.CAEViT(arch: str = 'b', img_size: int = 224, patch_size: int = 16, out_indices: int = - 1, drop_rate: float = 0, drop_path_rate: float = 0, qkv_bias: bool = True, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, final_norm: bool = True, output_cls_token: bool = True, interpolate_mode: str = 'bicubic', init_values: Optional[float] = None, patch_cfg: dict = {}, layer_cfgs: dict = {}, init_cfg: Optional[dict] = None)[源代码]¶

Vision Transformer for CAE pre-training.

Rewritten version of: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

参数

arch (str | dict) – Vision Transformer architecture. Default: ‘b’
img_size (int | tuple) – Input image size
patch_size (int | tuple) – The patch size
out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.
drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').
final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.
output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.
interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.
init_values (float, optional) – The init value of gamma in TransformerEncoderLayer.
patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.
layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.
init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(img: torch.Tensor, mask: torch.Tensor) → torch.Tensor[源代码]¶

Forward computation.

参数: x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

init_weights() → None[源代码]¶: Initialize the weights.

class mmselfsup.models.backbones.MAEViT(arch='b', img_size=224, patch_size=16, out_indices=- 1, drop_rate=0, drop_path_rate=0, norm_cfg={'eps': 1e-06, 'type': 'LN'}, final_norm=True, output_cls_token=True, interpolate_mode='bicubic', patch_cfg={}, layer_cfgs={}, mask_ratio=0.75, init_cfg=None)[源代码]¶

Vision Transformer for MAE pre-training.

A PyTorch implement of: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

参数

arch (str | dict) – Vision Transformer architecture Default: ‘b’
img_size (int | tuple) – Input image size
patch_size (int | tuple) – The patch size
out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.
drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').
final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.
output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.
interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.
patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.
layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.
mask_ratio (bool) – The ratio of total number of patches to be masked. Defaults to 0.75.
init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x)[源代码]¶

Forward computation.

参数: x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

init_weights()[源代码]¶: Initialize the weights.

random_masking(x, mask_ratio=0.75)[源代码]¶

Generate the mask for MAE Pre-training.

参数

x (torch.tensor) – Image with data augmentation applied.
mask_ratio (float) – The mask ratio of total patches. Defaults to 0.75.

返回

masked image, mask and the ids: to restore original image.

x_masked (Tensor): masked image.
mask (Tensor): mask used to mask image.
ids_restore (Tensor): ids to restore original image.

返回类型

tuple[Tensor, Tensor, Tensor]

class mmselfsup.models.backbones.MIMVisionTransformer(arch='b', img_size=224, patch_size=16, out_indices=- 1, use_window=False, drop_rate=0, drop_path_rate=0, qkv_bias=True, norm_cfg={'eps': 1e-06, 'type': 'LN'}, final_norm=True, output_cls_token=True, interpolate_mode='bicubic', init_values=0.0, patch_cfg={}, layer_cfgs={}, finetune=True, init_cfg=None)[源代码]¶

Vision Transformer for MIM-style model (Mask Image Modeling) classification (fine-tuning or linear probe).

A PyTorch implement of : An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

参数

arch (str | dict) – Vision Transformer architecture Default: ‘b’
img_size (int | tuple) – Input image size
patch_size (int | tuple) – The patch size
out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.
drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').
final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.
output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.
interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.
patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.
layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.
finetune (bool) – Whether or not do fine-tuning. Defaults to True.
init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x)[源代码]¶

Forward computation.

参数: x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

train(mode=True)[源代码]¶

Set module status before forward computation.

参数: mode (bool) – Whether it is train_mode or test_mode

class mmselfsup.models.backbones.MaskFeatViT(arch: Union[str, dict] = 'b', img_size: Union[Tuple[int, int], int] = 224, patch_size: int = 16, out_indices: int = - 1, drop_rate: float = 0.0, drop_path_rate: float = 0.0, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, final_norm: bool = True, output_cls_token: bool = True, interpolate_mode: str = 'bicubic', patch_cfg: dict = {}, layer_cfgs: dict = {}, init_cfg: Optional[dict] = None)[源代码]¶

Vision Transformer for MaskFeat pre-training.

A PyTorch implement of: Masked Feature Prediction for Self-Supervised Visual Pre-Training. :param arch: Vision Transformer architecture

Default: ‘b’

参数

img_size (int | tuple) – Input image size
patch_size (int | tuple) – The patch size
out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.
drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').
final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.
output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.
interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.
patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.
layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.
init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor, mask: torch.Tensor) → torch.Tensor[源代码]¶

Generate features for masked images.

参数

x (torch.Tensor) – Input images.
mask (torch.Tensor) – Input masks.

返回

Features with cls_tokens.

返回类型

torch.Tensor

init_weights() → None[源代码]¶: Initialize the weights.

class mmselfsup.models.backbones.ResNeXt(depth, groups=32, width_per_group=4, **kwargs)[源代码]¶

ResNeXt backbone.

Please refer to the paper for details.

As the behavior of forward function in MMSelfSup is different from MMCls, we register our own ResNeXt, inheriting from mmselfsup.model.backbone.ResNet.

参数

depth (int) – Network depth, from {50, 101, 152}.
groups (int) – Groups of conv2 in Bottleneck. Defaults to 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Defaults to 4.
in_channels (int) – Number of input image channels. Defaults to 3.
stem_channels (int) – Output channels of the stem layer. Defaults to 64.
num_stages (int) – Stages of the network. Defaults to 4.
strides (Sequence[int]) – Strides of the first block of each stage. Defaults to (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Defaults to (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Defaults to (3, ).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Defaults to False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Defaults to False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
conv_cfg (dict | None) – The config dict for conv layers. Defaults to None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Defaults to False.

示例

>>> from mmselfsup.models import ResNeXt
>>> import torch
>>> self = ResNeXt(depth=50)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 8, 8)
(1, 512, 4, 4)
(1, 1024, 2, 2)
(1, 2048, 1, 1)

make_res_layer(**kwargs)[源代码]¶: Redefine the function for ResNeXt related args.

class mmselfsup.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, expansion=None, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(4), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}], drop_path_rate=0.0, **kwargs)[源代码]¶

ResNet backbone.

Please refer to the paper for details.

参数

depth (int) – Network depth, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Defaults to 3.
stem_channels (int) – Output channels of the stem layer. Defaults to 64.
base_channels (int) – Middle channels of the first stage. Defaults to 64.
num_stages (int) – Stages of the network. Defaults to 4.
strides (Sequence[int]) – Strides of the first block of each stage. Defaults to (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Defaults to (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. Defaults to (4, ).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Defaults to False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Defaults to False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
conv_cfg (dict | None) – The config dict for conv layers. Defaults to None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Defaults to False.
of the path to be zeroed. Defaults to 0.1 (Probability) –

示例

>>> from mmselfsup.models import ResNet
>>> import torch
>>> self = ResNet(depth=18)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)

forward(x)[源代码]¶

Forward function.

As the behavior of forward function in MMSelfSup is different from MMCls, we rewrite the forward function. MMCls does not output the feature map from the ‘stem’ layer, which we will use for downstream evaluation.

class mmselfsup.models.backbones.ResNetV1d(**kwargs)[源代码]¶

ResNetV1d variant described in Bag of Tricks.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmselfsup.models.backbones.SimMIMSwinTransformer(arch: Union[str, dict] = 'T', img_size: Union[Tuple[int, int], int] = 224, in_channels: int = 3, drop_rate: float = 0.0, drop_path_rate: float = 0.1, out_indices: tuple = (3), use_abs_pos_embed: bool = False, with_cp: bool = False, frozen_stages: bool = - 1, norm_eval: bool = False, norm_cfg: dict = {'type': 'LN'}, stage_cfgs: Union[Sequence, dict] = {}, patch_cfg: dict = {}, init_cfg: Optional[dict] = None)[源代码]¶

Swin Transformer for SimMIM.

参数

Args –
arch (str | dict) – Swin Transformer architecture Defaults to ‘T’.
img_size (int | tuple) – The size of input image. Defaults to 224.
in_channels (int) – The num of input channels. Defaults to 3.
drop_rate (float) – Dropout rate after embedding. Defaults to 0.
drop_path_rate (float) – Stochastic depth rate. Defaults to 0.1.
out_indices (tuple) – Layers to be outputted. Defaults to (3, ).
use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults to False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
norm_cfg (dict) – Config dict for normalization layer at end of backone. Defaults to dict(type=’LN’)
stage_cfgs (Sequence | dict) – Extra config dict for each stage. Defaults to empty dict.
patch_cfg (dict) – Extra config dict for patch embedding. Defaults to empty dict.
init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x: torch.Tensor, mask: torch.Tensor) → Sequence[torch.Tensor][源代码]¶

Generate features for masked images.

This function generates mask images and get the hidden features for them.

参数

x (torch.Tensor) – Input images.
mask (torch.Tensor) – Masks used to construct masked images.

返回

A tuple containing features from multi-stages.

返回类型

tuple

init_weights() → None[源代码]¶: Initialize weights.

class mmselfsup.models.backbones.VisionTransformer(stop_grad_conv1=False, frozen_stages=- 1, norm_eval=False, init_cfg=None, **kwargs)[源代码]¶

Vision Transformer.

A pytorch implement of: An Images is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Part of the code is modified from: https://github.com/facebookresearch/moco-v3/blob/main/vits.py.

参数

stop_grad_conv1 (bool, optional) – whether to stop the gradient of convolution layer in PatchEmbed. Defaults to False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

init_weights()[源代码]¶: Initialize the weights.

train(mode=True)[源代码]¶

Set module status before forward computation.

参数: mode (bool) – Whether it is train_mode or test_mode

heads¶

class mmselfsup.models.heads.CAEHead(tokenizer_path: str, lambd: float, init_cfg: Optional[dict] = None)[源代码]¶

Pretrain Head for CAE.

Compute the align loss and the main loss. In addition, this head also generates the prediction target generated by dalle.

参数

tokenizer_path (str) – The path of the tokenizer.
lambd (float) – The weight for the align loss.
init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(img_target: torch.Tensor, outputs: torch.Tensor, latent_pred: torch.Tensor, latent_target: torch.Tensor, mask: torch.Tensor) → dict[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.heads.ClsHead(with_avg_pool=False, in_channels=2048, num_classes=1000, vit_backbone=False, init_cfg=[{'type': 'Normal', 'std': 0.01, 'layer': 'Linear'}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

Simplest classifier head, with only one fc layer.

参数

with_avg_pool (bool) – Whether to apply the average pooling after neck. Defaults to False.
in_channels (int) – Number of input channels. Defaults to 2048.
num_classes (int) – Number of classes. Defaults to 1000.
init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x)[源代码]¶

Forward head.

参数: x (list[Tensor] | tuple[Tensor]) – Feature maps of backbone, each tensor has shape (N, C, H, W).
返回: A list of class scores.
返回类型: list[Tensor]

loss(cls_score, labels)[源代码]¶: Compute the loss.

class mmselfsup.models.heads.ContrastiveHead(temperature=0.1)[源代码]¶

Head for contrastive learning.

The contrastive loss is implemented in this head and is used in SimCLR, MoCo, DenseCL, etc.

参数: temperature (float) – The temperature hyper-parameter that controls the concentration level of the distribution. Defaults to 0.1.

forward(pos, neg)[源代码]¶

Forward function to compute contrastive loss.

参数

pos (Tensor) – Nx1 positive similarity.
neg (Tensor) – Nxk negative similarity.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

class mmselfsup.models.heads.LatentClsHead(in_channels: int, num_classes: int, init_cfg: dict = {'layer': 'Linear', 'std': 0.01, 'type': 'Normal'})[源代码]¶

Head for latent feature classification.

参数

in_channels (int) – Number of input channels.
num_classes (int) – Number of classes.
init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(input: torch.Tensor, target: torch.Tensor) → dict[源代码]¶

Forward head.

参数

input (Tensor) – NxC input features.
target (Tensor) – NxC target features.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

class mmselfsup.models.heads.LatentCrossCorrelationHead(in_channels: int, lambd: float = 0.0051)[源代码]¶

Head for latent feature cross correlation. Part of the code is borrowed from: `https://github.com/facebookresearch/barlowtwins/blob/main/main.py>`_.

参数

in_channels (int) – Number of input channels.
lambd (float) – Weight on off-diagonal terms. Defaults to 0.0051.

forward(input: torch.Tensor, target: torch.Tensor) → dict[源代码]¶

Forward head.

参数

input (Tensor) – NxC input features.
target (Tensor) – NxC target features.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

off_diagonal(x: torch.Tensor) → torch.Tensor[源代码]¶: Rreturn a flattened view of the off-diagonal elements of a square matrix.

class mmselfsup.models.heads.LatentPredictHead(predictor: dict)[源代码]¶

Head for latent feature prediction.

This head builds a predictor, which can be any registered neck component. For example, BYOL and SimSiam call this head and build NonLinearNeck. It also implements similarity loss between two forward features.

参数: predictor (dict) – Config dict for the predictor.

forward(input: torch.Tensor, target: torch.Tensor) → dict[源代码]¶

Forward head.

参数

input (Tensor) – NxC input features.
target (Tensor) – NxC target features.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

class mmselfsup.models.heads.MAEFinetuneHead(embed_dim, num_classes=1000, label_smooth_val=0.1)[源代码]¶

Fine-tuning head for MAE.

参数

embed_dim (int) – The dim of the feature before the classifier head.
num_classes (int) – The total classes. Defaults to 1000.

forward(x)[源代码]¶: “Get the logits.

init_weights()[源代码]¶: Initialize the weights.

loss(outputs, labels)[源代码]¶: Compute the loss.

class mmselfsup.models.heads.MAELinprobeHead(embed_dim, num_classes=1000)[源代码]¶

Linear probing head for MAE.

参数

embed_dim (int) – The dim of the feature before the classifier head.
num_classes (int) – The total classes. Defaults to 1000.

forward(x)[源代码]¶: “Get the logits.

init_weights()[源代码]¶: Initialize the weights.

loss(outputs, labels)[源代码]¶: Compute the loss.

class mmselfsup.models.heads.MAEPretrainHead(norm_pix: bool = False, patch_size: int = 16)[源代码]¶

Pre-training head for MAE.

参数

norm_pix_loss (bool) – Whether or not normalize target. Defaults to False.
patch_size (int) – Patch size. Defaults to 16.

forward(x: torch.Tensor, pred: torch.Tensor, mask: torch.Tensor) → dict[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

patchify(imgs: torch.Tensor) → torch.Tensor[源代码]¶

参数: imgs (torch.Tensor) – The shape is (N, 3, H, W)
返回: The shape is (N, L, patch_size**2 *3)
返回类型: x (torch.Tensor)

unpatchify(x: torch.Tensor) → torch.Tensor[源代码]¶

参数: x (torch.Tensor) – The shape is (N, L, patch_size**2 *3)
返回: The shape is (N, 3, H, W)
返回类型: imgs (torch.Tensor)

class mmselfsup.models.heads.MaskFeatFinetuneHead(embed_dim: int, num_classes: int = 1000, label_smooth_val: float = 0.1)[源代码]¶

Fine-tuning head for MaskFeat.

参数

embed_dim (int) – The dim of the feature before the classifier head.
num_classes (int) – The total classes. Defaults to 1000.
label_smooth_val (float) – The degree of label smoothing. Defaults to 0.1.

forward(x: torch.Tensor) → list[源代码]¶: “Get the logits.

init_weights() → None[源代码]¶: Initialize the weights.

loss(outputs: torch.Tensor, labels: torch.Tensor) → dict[源代码]¶: Compute the loss.

class mmselfsup.models.heads.MaskFeatPretrainHead(embed_dim: int = 768, hog_dim: int = 108)[源代码]¶

Pre-training head for MaskFeat.

参数

embed_dim (int) – The dim of the feature before the classifier head. Defaults to 768.
hog_dim (int) – The dim of the hog feature. Defaults to 108.

forward(latent: torch.Tensor, hog: torch.Tensor, mask: torch.Tensor) → dict[源代码]¶

Pre-training head for MaskFeat.

参数

latent (torch.Tensor) – Input latent of shape (N, 1+L, C).
hog (torch.Tensor) – Input hog feature of shape (N, L, C).
mask (torch.Tensor) – Input mask of shape (N, H, W).

返回

A dictionary of loss components.

返回类型

Dict[str, torch.Tensor]

init_weights() → None[源代码]¶: Initialize the weights.

loss(pred: torch.Tensor, target: torch.Tensor, mask: torch.Tensor) → dict[源代码]¶

Compute the loss.

参数

pred (torch.Tensor) – Input prediction of shape (N, L, C).
target (torch.Tensor) – Input target of shape (N, L, C).
mask (torch.Tensor) – Input mask of shape (N, L, 1).

返回

A dictionary of loss components.

返回类型

dict[str, torch.Tensor]

class mmselfsup.models.heads.MoCoV3Head(predictor, temperature=1.0)[源代码]¶

Head for MoCo v3 algorithms.

This head builds a predictor, which can be any registered neck component. It also implements latent contrastive loss between two forward features. Part of the code is modified from: https://github.com/facebookresearch/moco-v3/blob/main/moco/builder.py.

参数

predictor (dict) – Config dict for module of predictor.
temperature (float) – The temperature hyper-parameter that controls the concentration level of the distribution. Defaults to 1.0.

forward(base_out, momentum_out)[源代码]¶

Forward head.

参数

base_out (Tensor) – NxC features from base_encoder.
momentum_out (Tensor) – NxC features from momentum_encoder.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

class mmselfsup.models.heads.MultiClsHead(pool_type='adaptive', in_indices=(0), with_last_layer_unpool=False, backbone='resnet50', norm_cfg={'type': 'BN'}, num_classes=1000, init_cfg=[{'type': 'Normal', 'std': 0.01, 'layer': 'Linear'}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

Multiple classifier heads.

This head inputs feature maps from different stages of backbone, average pools each feature map to around 9000 dimensions, and then appends a linear classifier at each stage to predict corresponding class scores.

参数

pool_type (str) – ‘adaptive’ or ‘specified’. If set to ‘adaptive’, use adaptive average pooling, otherwise use specified pooling params.
in_indices (Sequence[int]) – Input from which stages.
with_last_layer_unpool (bool) – Whether to unpool the features from last layer. Defaults to False.
backbone (str) – Specify which backbone to use. Defaults to ‘resnet50’.
norm_cfg (dict) – dictionary to construct and config norm layer.
num_classes (int) – Number of classes. Defaults to 1000.
init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x)[源代码]¶

Forward head.

参数: x (list[Tensor] | tuple[Tensor]) – Feature maps of backbone, each tensor has shape (N, C, H, W).
返回: A list of class scores.
返回类型: list[Tensor]

loss(cls_score, labels)[源代码]¶: Compute the loss.

class mmselfsup.models.heads.SimMIMHead(patch_size: int, encoder_in_channels: int)[源代码]¶

Pretrain Head for SimMIM.

参数

patch_size (int) – Patch size of each token.
encoder_in_channels (int) – Number of input channels for encoder.

forward(x: torch.Tensor, x_rec: torch.Tensor, mask: torch.Tensor) → dict[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.heads.SwAVHead(feat_dim, sinkhorn_iterations=3, epsilon=0.05, temperature=0.1, crops_for_assign=[0, 1], num_crops=[2], num_prototypes=3000, init_cfg=None)[源代码]¶

The head for SwAV.

This head contains clustering and sinkhorn algorithms to compute Q codes. Part of the code is borrowed from: `<https://github.com/facebookresearch/swav`_. The queue is built in core/hooks/swav_hook.py.

参数

feat_dim (int) – feature dimension of the prototypes.
sinkhorn_iterations (int) – number of iterations in Sinkhorn-Knopp algorithm. Defaults to 3.
epsilon (float) – regularization parameter for Sinkhorn-Knopp algorithm. Defaults to 0.05.
temperature (float) – temperature parameter in training loss. Defaults to 0.1.
crops_for_assign (list[int]) – list of crops id used for computing assignments. Defaults to [0, 1].
num_crops (list[int]) – list of number of crops. Defaults to [2].
num_prototypes (int) – number of prototypes. Defaults to 3000.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x)[源代码]¶

Forward head of swav to compute the loss.

参数: x (Tensor) – NxC input features.
返回: A dictionary of loss components.
返回类型: dict[str, Tensor]

memories¶

class mmselfsup.models.memories.InterCLRMemory(length, feat_dim, momentum, num_classes, min_cluster, **kwargs)[源代码]¶

Memory bank for InterCLR.

参数

length (int) – Number of features stored in the memory bank.
feat_dim (int) – Dimension of stored features.
momentum (float) – Momentum coefficient for updating features.
num_classes (int) – Number of clusters.
min_cluster (int) – Minimal cluster size.

assign_label(label)[源代码]¶: Assign offline labels for each epoch.

deal_with_small_clusters()[源代码]¶: Deal with small clusters.

update_centroids_memory(cinds=None)[源代码]¶: Update centroids in the memory bank.

update_samples_memory(ind, feature)[源代码]¶: Update features and labels in the memory bank.

update_simple_memory(ind, feature)[源代码]¶: Update features in the memory bank.

class mmselfsup.models.memories.ODCMemory(length, feat_dim, momentum, num_classes, min_cluster, **kwargs)[源代码]¶

Memory module for ODC.

This module includes the samples memory and the centroids memory in ODC. The samples memory stores features and pseudo-labels of all samples in the dataset; while the centroids memory stores features of cluster centroids.

参数

length (int) – Number of features stored in samples memory.
feat_dim (int) – Dimension of stored features.
momentum (float) – Momentum coefficient for updating features.
num_classes (int) – Number of clusters.
min_cluster (int) – Minimal cluster size.

deal_with_small_clusters()[源代码]¶: Deal with small clusters.

init_memory(feature, label)[源代码]¶: Initialize memory modules.

update_centroids_memory(cinds=None)[源代码]¶: Update centroids memory.

update_samples_memory(ind, feature)[源代码]¶: Update samples memory.

class mmselfsup.models.memories.SimpleMemory(length, feat_dim, momentum, **kwargs)[源代码]¶

Simple memory bank (e.g., for NPID).

This module includes the memory bank that stores running average features of all samples in the dataset.

参数

length (int) – Number of features stored in the memory bank.
feat_dim (int) – Dimension of stored features.
momentum (float) – Momentum coefficient for updating features.

update(ind, feature)[源代码]¶

Update features in memory bank.

参数

ind (Tensor) – Indices for the batch of features.
feature (Tensor) – Batch of features.

necks¶

class mmselfsup.models.necks.AvgPool2dNeck(output_size=1)[源代码]¶

The average pooling 2d neck.

forward(x)[源代码]¶: Forward function.

class mmselfsup.models.necks.CAENeck(patch_size: int = 16, num_classes: int = 8192, embed_dims: int = 768, regressor_depth: int = 6, decoder_depth: int = 8, num_heads: int = 12, mlp_ratio: int = 4, qkv_bias: bool = True, qk_scale: Optional[float] = None, drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'}, init_values: Optional[float] = None, mask_tokens_num: int = 75, init_cfg: Optional[dict] = None)[源代码]¶

Neck for CAE Pre-training.

This module construct the latent prediction regressor and the decoder for the latent prediction and final prediction.

参数

patch_size (int) – The patch size of each token. Defaults to 16.
num_classes (int) – The number of classes for final prediction. Defaults to 8192.
embed_dims (int) – The embed dims of latent feature in regressor and decoder. Defaults to 768.
regressor_depth (int) – The number of regressor blocks. Defaults to 6.
decoder_depth (int) – The number of decoder blocks. Defaults to 8.
num_heads (int) – The number of head in multi-head attention. Defaults to 12.
mlp_ratio (int) – The expand ratio of latent features in MLP. defaults to 4.
qkv_bias (bool) – Whether or not to use qkv bias. Defaults to True.
qk_scale (float, optional) – The scale applied to the results of qk. Defaults to None.
drop_rate (float) – The dropout rate. Defaults to 0.
attn_drop_rate (float) – The dropout rate in attention block. Defaults to 0.
norm_cfg (dict) – The config of normalization layer. Defaults to dict(type=’LN’, eps=1e-6).
init_values (float, optional) – The init value of gamma. Defaults to None.
mask_tokens_num (int) – The number of mask tokens. Defaults to 75.
init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x_unmasked: torch.Tensor, pos_embed_masked: torch.Tensor, pos_embed_unmasked: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][源代码]¶

Get the latent prediction and final prediction.

参数

x_unmasked (torch.Tensor) – Features of unmasked tokens.
pos_embed_masked (torch.Tensor) – Position embedding of masked tokens.
pos_embed_unmasked (torch.Tensor) – Position embedding of unmasked tokens.

返回

Final prediction and latent: prediction.

返回类型

Tuple[torch.Tensor, torch.Tensor]

init_weights() → None[源代码]¶: Initialize the weights.

class mmselfsup.models.necks.DenseCLNeck(in_channels, hid_channels, out_channels, num_grid=None, init_cfg=None)[源代码]¶

The non-linear neck of DenseCL.

Single and dense neck in parallel: fc-relu-fc, conv-relu-conv. Borrowed from the authors’ code: `<https://github.com/WXinlong/DenseCL`_.

参数

in_channels (int) – Number of input channels.
hid_channels (int) – Number of hidden channels.
out_channels (int) – Number of output channels.
num_grid (int) – The grid size of dense features. Defaults to None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x)[源代码]¶

Forward function of neck.

参数: x (list[tensor]) – feature map of backbone.

class mmselfsup.models.necks.LinearNeck(in_channels, out_channels, with_avg_pool=True, init_cfg=None)[源代码]¶

The linear neck: fc only.

参数

in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
with_avg_pool (bool) – Whether to apply the global average pooling after backbone. Defaults to True.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.necks.MAEPretrainDecoder(num_patches=196, patch_size=16, in_chans=3, embed_dim=1024, decoder_embed_dim=512, decoder_depth=8, decoder_num_heads=16, mlp_ratio=4.0, norm_cfg={'eps': 1e-06, 'type': 'LN'})[源代码]¶

Decoder for MAE Pre-training.

参数

num_patches (int) – The number of total patches. Defaults to 196.
patch_size (int) – Image patch size. Defaults to 16.
in_chans (int) – The channel of input image. Defaults to 3.
embed_dim (int) – Encoder’s embedding dimension. Defaults to 1024.
decoder_embed_dim (int) – Decoder’s embedding dimension. Defaults to 512.
decoder_depth (int) – The depth of decoder. Defaults to 8.
decoder_num_heads (int) – Number of attention heads of decoder. Defaults to 16.
mlp_ratio (int) – Ratio of mlp hidden dim to decoder’s embedding dim. Defaults to 4.
norm_cfg (dict) – Normalization layer. Defaults to LayerNorm.

Some of the code is borrowed from https://github.com/facebookresearch/mae.

示例

>>> from mmselfsup.models import MAEPretrainDecoder
>>> import torch
>>> self = MAEPretrainDecoder()
>>> self.eval()
>>> inputs = torch.rand(1, 50, 1024)
>>> ids_restore = torch.arange(0, 196).unsqueeze(0)
>>> level_outputs = self.forward(inputs, ids_restore)
>>> print(tuple(level_outputs.shape))
(1, 196, 768)

forward(x, ids_restore)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[源代码]¶: Initialize the weights.

class mmselfsup.models.necks.MoCoV2Neck(in_channels, hid_channels, out_channels, with_avg_pool=True, init_cfg=None)[源代码]¶

The non-linear neck of MoCo v2: fc-relu-fc.

参数

in_channels (int) – Number of input channels.
hid_channels (int) – Number of hidden channels.
out_channels (int) – Number of output channels.
with_avg_pool (bool) – Whether to apply the global average pooling after backbone. Defaults to True.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.necks.NonLinearNeck(in_channels, hid_channels, out_channels, num_layers=2, with_bias=False, with_last_bn=True, with_last_bn_affine=True, with_last_bias=False, with_avg_pool=True, vit_backbone=False, norm_cfg={'type': 'SyncBN'}, init_cfg=[{'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

The non-linear neck.

Structure: fc-bn-[relu-fc-bn] where the substructure in [] can be repeated. For the default setting, the repeated time is 1. The neck can be used in many algorithms, e.g., SimCLR, BYOL, SimSiam.

参数

in_channels (int) – Number of input channels.
hid_channels (int) – Number of hidden channels.
out_channels (int) – Number of output channels.
num_layers (int) – Number of fc layers. Defaults to 2.
with_bias (bool) – Whether to use bias in fc layers (except for the last). Defaults to False.
with_last_bn (bool) – Whether to add the last BN layer. Defaults to True.
with_last_bn_affine (bool) – Whether to have learnable affine parameters in the last BN layer (set False for SimSiam). Defaults to True.
with_last_bias (bool) – Whether to use bias in the last fc layer. Defaults to False.
with_avg_pool (bool) – Whether to apply the global average pooling after backbone. Defaults to True.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’SyncBN’).
init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.necks.ODCNeck(in_channels, hid_channels, out_channels, with_avg_pool=True, norm_cfg={'type': 'SyncBN'}, init_cfg=[{'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

The non-linear neck of ODC: fc-bn-relu-dropout-fc-relu.

参数

in_channels (int) – Number of input channels.
hid_channels (int) – Number of hidden channels.
out_channels (int) – Number of output channels.
with_avg_pool (bool) – Whether to apply the global average pooling after backbone. Defaults to True.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’SyncBN’).
init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.necks.RelativeLocNeck(in_channels, out_channels, with_avg_pool=True, norm_cfg={'type': 'BN1d'}, init_cfg=[{'type': 'Normal', 'std': 0.01, 'layer': 'Linear'}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

The neck of relative patch location: fc-bn-relu-dropout.

参数

in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
with_avg_pool (bool) – Whether to apply the global average pooling after backbone. Defaults to True.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN1d’).
init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.necks.SimMIMNeck(in_channels: int, encoder_stride: int)[源代码]¶

Pre-train Neck For SimMIM.

This neck reconstructs the original image from the shrunk feature map.

参数

in_channels (int) – Channel dimension of the feature map.
encoder_stride (int) – The total stride of the encoder.

forward(x: torch.Tensor) → torch.Tensor[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.necks.SwAVNeck(in_channels, hid_channels, out_channels, with_avg_pool=True, with_l2norm=True, norm_cfg={'type': 'SyncBN'}, init_cfg=[{'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

The non-linear neck of SwAV: fc-bn-relu-fc-normalization.

参数

in_channels (int) – Number of input channels.
hid_channels (int) – Number of hidden channels.
out_channels (int) – Number of output channels.
with_avg_pool (bool) – Whether to apply the global average pooling after backbone. Defaults to True.
with_l2norm (bool) – whether to normalize the output after projection. Defaults to True.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’SyncBN’).
init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

utils¶

class mmselfsup.models.utils.Accuracy(topk=(1))[源代码]¶

Implementation of accuracy computation.

forward(pred, target)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.utils.CAETransformerRegressorLayer(embed_dims: int, num_heads: int, feedforward_channels: int, num_fcs: int = 2, qkv_bias: bool = False, qk_scale: Optional[float] = None, drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, init_values: float = 0.0, act_cfg: dict = {'type': 'GELU'}, norm_cfg: dict = {'eps': 1e-06, 'type': 'LN'})[源代码]¶

Transformer layer for the regressor of CAE.

This module is different from conventional transformer encoder layer, for its queries are the masked tokens, but its keys and values are the concatenation of the masked and unmasked tokens.

参数

embed_dims (int) – The feature dimension.
num_heads (int) – The number of heads in multi-head attention.
feedforward_channels (int) – The hidden dimension of FFNs. Defaults: 1024.
num_fcs (int, optional) – The number of fully-connected layers in FFNs. Default: 2.
qkv_bias (bool) – If True, add a learnable bias to q, k, v. Defaults to True.
qk_scale (float, optional) – Override default qk scale of head_dim ** -0.5 if set. Defaults to None.
drop_rate (float) – The dropout rate. Defaults to 0.0.
attn_drop_rate (float) – The drop out rate for attention output weights. Defaults to 0.
drop_path_rate (float) – Stochastic depth rate. Defaults to 0.
init_values (float) – The init values of gamma. Defaults to 0.0.
act_cfg (dict) – The activation config for FFNs. Defaluts to dict(type='GELU').
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

forward(x_q: torch.Tensor, x_kv: torch.Tensor, pos_q: torch.Tensor, pos_k: torch.Tensor) → torch.Tensor[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.utils.Encoder(n_hid: int = 256, n_blk_per_group: int = 2, input_channels: int = 3, vocab_size: int = 8192, device: torch.device = device(type='cpu'), requires_grad: bool = False, use_mixed_precision: bool = True)[源代码]¶

forward(x: torch.Tensor) → torch.Tensor[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.utils.ExtractProcess[源代码]¶

Global average-pooled feature extraction process.

This process extracts the global average-pooled features from the last layer of resnet backbone.

extract(model, data_loader, distributed=False)[源代码]¶: The extract function to apply forward function and choose distributed or not.

class mmselfsup.models.utils.GatherLayer(*args, **kwargs)[源代码]¶

Gather tensors from all process, supporting backward propagation.

static backward(ctx, *grads)[源代码]¶

Defines a formula for differentiating the operation with backward mode automatic differentiation (alias to the vjp function).

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computated w.r.t. the output.

static forward(ctx, input)[源代码]¶

Performs the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with ctx.save_for_backward() if they are intended to be used in backward (equivalently, vjp) or ctx.save_for_forward() if they are intended to be used for in jvp.

class mmselfsup.models.utils.MultiExtractProcess(pool_type='specified', backbone='resnet50', layer_indices=(0, 1, 2, 3, 4))[源代码]¶

Multi-stage intermediate feature extraction process for extract.py and tsne_visualization.py in tools.

This process extracts feature maps from different stages of backbone, and average pools each feature map to around 9000 dimensions.

参数

pool_type (str) – Pooling type in MultiPooling. Options are “adaptive” and “specified”. Defaults to “specified”.
backbone (str) – Backbone type, now only support “resnet50”. Defaults to “resnet50”.
layer_indices (Sequence[int]) – Output from which stages. 0 for stem, 1, 2, 3, 4 for res layers. Defaults to (0, 1, 2, 3, 4).

extract(model, data_loader, distributed=False)[源代码]¶: The extract function to apply forward function and choose distributed or not.

class mmselfsup.models.utils.MultiPooling(pool_type='adaptive', in_indices=(0), backbone='resnet50')[源代码]¶

Pooling layers for features from multiple depth.

参数

pool_type (str) – Pooling type for the feature map. Options are ‘adaptive’ and ‘specified’. Defaults to ‘adaptive’.
in_indices (Sequence[int]) – Output from which backbone stages. Defaults to (0, ).
backbone (str) – The selected backbone. Defaults to ‘resnet50’.

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.utils.MultiPrototypes(output_dim, num_prototypes)[源代码]¶

Multi-prototypes for SwAV head.

参数

output_dim (int) – The output dim from SwAV neck.
num_prototypes (list[int]) – The number of prototypes needed.

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.utils.MultiheadAttention(embed_dims: int, num_heads: int, input_dims: Optional[int] = None, attn_drop: float = 0.0, proj_drop: float = 0.0, qkv_bias: bool = True, qk_scale: Optional[float] = None, proj_bias: bool = True, init_cfg: Optional[dict] = None)[源代码]¶

Multi-head Attention Module.

This module rewrite the MultiheadAttention by replacing qkv bias with customized qkv bias, in addition to removing the drop path layer.

参数

embed_dims (int) – The embedding dimension.
num_heads (int) – Parallel attention heads.
input_dims (int, optional) – The input dimension, and if None, use embed_dims. Defaults to None.
attn_drop (float) – Dropout rate of the dropout layer after the attention calculation of query and key. Defaults to 0.
proj_drop (float) – Dropout rate of the dropout layer after the output projection. Defaults to 0.
dropout_layer (dict) – The dropout config before adding the shortcut. Defaults to dict(type='Dropout', drop_prob=0.).
qkv_bias (bool) – If True, add a learnable bias to q, k, v. Defaults to True.
qk_scale (float, optional) – Override default qk scale of head_dim ** -0.5 if set. Defaults to None.
proj_bias (bool) – Defaults to True.
init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x: torch.Tensor) → torch.Tensor[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.utils.Sobel[源代码]¶

Sobel layer.

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmselfsup.models.utils.TransformerEncoderLayer(embed_dims: int, num_heads: int, feedforward_channels: int, window_size: Optional[int] = None, drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, num_fcs: int = 2, qkv_bias: bool = True, act_cfg: dict = {'type': 'GELU'}, norm_cfg: dict = {'type': 'LN'}, init_values: float = 0.0, init_cfg: Optional[dict] = None)[源代码]¶

Implements one encoder layer in Vision Transformer.

This module is the rewritten version of the TransformerEncoderLayer in MMClassification by adding the gamma and relative position bias in Attention module.

参数

embed_dims (int) – The feature dimension.
num_heads (int) – Parallel attention heads
feedforward_channels (int) – The hidden dimension for FFNs
drop_rate (float) – Probability of an element to be zeroed after the feed forward layer. Defaults to 0.
attn_drop_rate (float) – The drop out rate for attention output weights. Defaults to 0.
drop_path_rate (float) – Stochastic depth rate. Defaults to 0.
num_fcs (int) – The number of fully-connected layers for FFNs. Defaults to 2.
qkv_bias (bool) – enable bias for qkv if True. Defaults to True.
act_cfg (dict) – The activation config for FFNs. Defaluts to dict(type='GELU').
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').
init_values (float) – The init values of gamma. Defaults to 0.0.
init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor) → torch.Tensor[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmselfsup.models.utils.accuracy(pred, target, topk=1)[源代码]¶

Compute accuracy of predictions.

参数

pred (Tensor) – The output of the model.
target (Tensor) – The labels of data.
topk (int | list[int]) – Top-k metric selection. Defaults to 1.

mmselfsup.models.utils.build_2d_sincos_position_embedding(patches_resolution, embed_dims, temperature=10000.0, cls_token=False)[源代码]¶: The function is to build position embedding for model to obtain the position information of the image patches.

mmselfsup.models.utils.knn_classifier(train_features, train_labels, test_features, test_labels, k, T, num_classes=1000)[源代码]¶

Compute accuracy of knn classifier predictions.

参数

train_features (Tensor) – Extracted features in the training set.
train_labels (Tensor) – Labels in the training set.
test_features (Tensor) – Extracted features in the testing set.
test_labels (Tensor) – Labels in the testing set.
k (int) – Number of NN to use.
T (float) – Temperature used in the voting coefficient.
num_classes (int) – Number of classes. Defaults to 1000.

mmselfsup.utils¶

class mmselfsup.utils.AliasMethod(probs)[源代码]¶

The alias method for sampling.

From: https://hips.seas.harvard.edu/blog/2013/03/03/the-alias-method-efficient-sampling-with-many-discrete-outcomes/

参数: probs (Tensor) – Sampling probabilities.

draw(N)[源代码]¶

Draw N samples from multinomial.

参数: N (int) – Number of samples.
返回: Samples.
返回类型: Tensor

class mmselfsup.utils.Extractor(dataset, samples_per_gpu, workers_per_gpu, dist_mode=False, persistent_workers=True, **kwargs)[源代码]¶

Feature extractor.

参数

dataset (Dataset | dict) – A PyTorch dataset or dict that indicates the dataset.
samples_per_gpu (int) – Number of images on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
dist_mode (bool) – Use distributed extraction or not. Defaults to False.
persistent_workers (bool) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. The argument also has effect in PyTorch>=1.7.0. Defaults to True.

mmselfsup.utils.batch_shuffle_ddp(x)[源代码]¶

Batch shuffle, for making use of BatchNorm.

* Only support DistributedDataParallel (DDP) model. *

mmselfsup.utils.batch_unshuffle_ddp(x, idx_unshuffle)[源代码]¶

Undo batch shuffle.

* Only support DistributedDataParallel (DDP) model. *

mmselfsup.utils.collect_env()[源代码]¶: Collect the information of the running environments.

mmselfsup.utils.concat_all_gather(tensor)[源代码]¶

Performs all_gather operation on the provided tensors.

* Warning *: torch.distributed.all_gather has no gradient.

mmselfsup.utils.dist_forward_collect(func, data_loader, rank, length, ret_rank=- 1)[源代码]¶

Forward and collect network outputs in a distributed manner.

This function performs forward propagation and collects outputs. It can be used to collect results, features, losses, etc.

参数

func (function) – The function to process data. The output must be a dictionary of CPU tensors.
data_loader (Dataloader) – the torch Dataloader to yield data.
rank (int) – This process id.
length (int) – Expected length of output arrays.
ret_rank (int) – The process that returns. Other processes will return None.

返回

The concatenated outputs.

返回类型

results_all (dict(np.ndarray))

mmselfsup.utils.distributed_sinkhorn(out, sinkhorn_iterations, world_size, epsilon)[源代码]¶: Apply the distributed sinknorn optimization on the scores matrix to find the assignments.

mmselfsup.utils.find_latest_checkpoint(path, suffix='pth')[源代码]¶

Find the latest checkpoint from the working directory. :param path: The path to find checkpoints. :type path: str :param suffix: File extension.

Defaults to pth.

返回: File path of the latest checkpoint.
返回类型: latest_path(str | None)

引用

1: https://github.com/microsoft/SoftTeacher /blob/main/ssod/utils/patch.py
2: https://github.com/open-mmlab/mmdetection /blob/master/mmdet/utils/misc.py#L7

mmselfsup.utils.gather_tensors(input_array)[源代码]¶: Gather tensor from all GPUs.

mmselfsup.utils.gather_tensors_batch(input_array, part_size=100, ret_rank=- 1)[源代码]¶: batch-wise gathering to avoid CUDA out of memory.

mmselfsup.utils.get_root_logger(log_file=None, log_level=20)[源代码]¶

Get root logger.

参数

log_file (str, optional) – File path of log. Defaults to None.
log_level (int, optional) – The level of logger. Defaults to logging.INFO.

返回

The obtained logger.

返回类型

logging.Logger

mmselfsup.utils.nondist_forward_collect(func, data_loader, length)[源代码]¶

Forward and collect network outputs.

This function performs forward propagation and collects outputs. It can be used to collect results, features, losses, etc.

参数

func (function) – The function to process data. The output must be a dictionary of CPU tensors.
data_loader (Dataloader) – the torch Dataloader to yield data.
length (int) – Expected length of output arrays.

返回

The concatenated outputs.

返回类型

results_all (dict(np.ndarray))

mmselfsup.utils.setup_multi_processes(cfg)[源代码]¶: Setup multi-processing environment variables.

mmselfsup.utils.sync_random_seed(seed=None, device='cuda')[源代码]¶

Make sure different ranks share the same seed. All workers must call this function, otherwise it will deadlock. This method is generally used in DistributedSampler, because the seed should be identical across all processes in the distributed group.

In distributed sampling, different ranks should sample non-overlapped data in the dataset. Therefore, this function is used to make sure that each rank shuffles the data indices in the same order based on the same seed. Then different ranks could use different indices to select non-overlapped data from the same data list.

参数

seed (int, Optional) – The seed. Default to None.
device (str) – The device where the seed will be put on. Default to ‘cuda’.

返回

Seed to be used.

返回类型

int

引用

1: https://github.com/open-mmlab/mmdetection /blob/master/mmdet/core/utils/dist_utils.py