Note

You are reading the documentation for MMSelfSup 0.x, which will soon be deprecated by the end of 2022. We recommend you upgrade to MMSelfSup 1.0.0rc versions to enjoy fruitful new features and better performance brought by OpenMMLab 2.0. Check out the changelog, code and documentation of MMSelfSup 1.0.0rc for more details.

Tutorial 1: Adding New Dataset¶

In this tutorial, we introduce the basic steps to create your customized dataset:

Tutorial 1: Adding New Dataset

If your algorithm does not need any customized dataset, you can use these off-the-shelf datasets under datasets. But to use these existing datasets, you have to convert your dataset to existing dataset format.

An example of customized dataset¶

Assuming the format of your dataset’s annotation file is:

000001.jpg 0
000002.jpg 1

To write a new dataset, you need to implement:

DataSource: inherited from BaseDataSource and responsible for loading the annotation files and reading images.
Dataset: inherited from BaseDataset and responsible for applying transformation to images and packing these images.

Creating the `DataSource`¶

Assume the name of your DataSource is NewDataSource, you can create a file, named new_data_source.py under mmselfsup/datasets/data_sources and implement NewDataSource in it.

import mmcv
import numpy as np

from ..builder import DATASOURCES
from .base import BaseDataSource


@DATASOURCES.register_module()
class NewDataSource(BaseDataSource):

    def load_annotations(self):

        assert isinstance(self.ann_file, str)
        data_infos = []
        # writing your code here.
        return data_infos

Then, add NewDataSource in mmselfsup/dataset/data_sources/__init__.py.

from .base import BaseDataSource
...
from .new_data_source import NewDataSource

__all__ = [
    'BaseDataSource', ..., 'NewDataSource'
]

Creating the `Dataset`¶

Assume the name of your Dataset is NewDataset, you can create a file, named new_dataset.py under mmselfsup/datasets and implement NewDataset in it.

# Copyright (c) OpenMMLab. All rights reserved.
import torch
from mmcv.utils import build_from_cfg
from torchvision.transforms import Compose

from .base import BaseDataset
from .builder import DATASETS, PIPELINES, build_datasource
from .utils import to_numpy


@DATASETS.register_module()
class NewDataset(BaseDataset):

    def __init__(self, data_source, num_views, pipelines, prefetch=False):
        # writing your code here
    def __getitem__(self, idx):
        # writing your code here
        return dict(img=img)

    def evaluate(self, results, logger=None):
        return NotImplemented

Then, add NewDataset in mmselfsup/dataset/__init__.py.

from .base import BaseDataset
...
from .new_dataset import NewDataset

__all__ = [
    'BaseDataset', ..., 'NewDataset'
]

Modify config file¶

To use NewDataset, you can modify the config as the following:

train=dict(
        type='NewDataset',
        data_source=dict(
            type='NewDataSource',
        ),
        num_views=[2],
        pipelines=[train_pipeline],
        prefetch=prefetch,
    ))

Tutorial 1: Adding New Dataset¶

An example of customized dataset¶

Creating the DataSource¶

Creating the Dataset¶

Modify config file¶

Creating the `DataSource`¶

Creating the `Dataset`¶