Datamodules

Datamodules are tools from Pytorch-lightning that allows to wrap all the logic to download datasets, verify its integrity, make transform, load the datasets, and make the dataloaders.

Eztorch contains basic wrapper to contain the logic of Datamodules to work with Hydra.

Several Datamodules already exist to handle various datasets.

Base Datamodules

Base

class eztorch.datamodules.BaseDataModule(datadir, train=None, val=None, test=None)[source]

Abstract class that inherits from LightningDataModule to follow standardized preprocessing for all datamodules in eztorch.

Parameters:
  • datadir (str) – Where to save/load the data.

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • data (val Configuration for the validation data to define the loading of)

  • dataloader. (the transforms and the)

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

Warning

The loader subconfigurations must not contain ‘batch_size’ that is automatically computed from the ‘global_batch_size’ specified in the configuration.

class eztorch.datamodules.FolderDataModule(datadir, train=None, val=None, test=None)[source]

Base datamodule for folder datasets.

Parameters:
  • datadir (str) – Where to save/load the data.

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

class eztorch.datamodules.DumbDataModule(train=None, val=None, test=None, num_classes=10)[source]

Dumb data module for testing models with random data.

Parameters:
  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data and the dataloader.

    Default: None

Video

class eztorch.datamodules.VideoBaseDataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decode_audio=False, decoder='pyav', decoder_args={})[source]

Abstract class that inherits from BaseDataModule to follow standardized preprocessing for video datamodules.

Parameters:
  • datadir (str) – Path to the data (eg: csv, folder, …).

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • data (val Configuration for the validation data to define the loading of)

  • dataloader. (the transforms and the)

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

  • video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.

    Default: ''

  • decode_audio (bool, optional) – If True, decode audio.

    Default: False

  • decoder (str, optional) – Defines which backend should be used to decode videos by default.

    Default: 'pyav'

  • decoder_args (DictConfig, optional) – Arguments to configure the default decoder.

    Default: {}

Warning

The loader subconfigurations must not contain ‘batch_size’ that is automatically computed from the ‘global_batch_size’ specified in the configuration.

Image Datamodules

CIFAR

class eztorch.datamodules.CIFAR10DataModule(datadir, train=None, val=None, test=None, num_classes_kept=None, split_train_ratio=None, seed_for_split=42)[source]

Datamodule for the CIFAR10 dataset.

Parameters:
  • datadir (str) – Where to save/load the data.

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

  • num_classes_kept (Optional[int], optional) – Number of classes to use.

    Default: None

  • split_train_ratio (Optional[float], optional) – If not None randomly split the train dataset in two with split_train_ration ratio for train.

    Default: None

  • seed_for_split (int, optional) – Seed for the split.

    Default: 42

Example:

datamodule = CIFAR10DataModule(datadir)
class eztorch.datamodules.CIFAR100DataModule(datadir, train=None, val=None, test=None, num_classes_kept=None, split_train_ratio=None, seed_for_split=42)[source]

Datamodule for the CIFAR100 dataset.

Parameters:
  • datadir (str) – Where to save/load the data.

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

  • num_classes_kept (Optional[int], optional) – Number of classes to use.

    Default: None

  • split_train_ratio (Optional[float], optional) – If not None randomly split the train dataset in two with split_train_ration ratio for train.

    Default: None

  • seed_for_split (int, optional) – Seed for the split.

    Default: 42

Example:

datamodule = CIFAR100DataModule(datadir)

ImageNet

class eztorch.datamodules.ImagenetDataModule(datadir, train=None, val=None, test=None)[source]

Base datamodule for the Imagenet dataset.

Parameters:
  • datadir (str) – Where to load the data.

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

class eztorch.datamodules.Imagenet100DataModule(datadir, train=None, val=None, test=None)[source]

Base datamodule for the Imagenet100 dataset.

Parameters:
  • datadir (str) – Where to load the data.

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

Example:

datamodule = Imagenet100DataModule(datadir)

STL10

class eztorch.datamodules.STL10DataModule(datadir, train=None, val=None, test=None, folds=None, training_split='unlabeled')[source]

Datamodule for the STL10 dataset in SSL setting.

Parameters:
  • datadir (str) – Where to save/load the data.

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

  • folds (Optional[int], optional) – One of \({0-9}\) or None. For training, loads one of the 10 pre-defined folds of 1k samples for the standard evaluation procedure. If no value is passed, loads the 5k samples.

    Default: None

  • training_split (str, optional) – Split used for the training dataset.

    Default: 'unlabeled'

Tiny-ImageNet

class eztorch.datamodules.TinyImagenetDataModule(datadir, train=None, val=None, test=None)[source]

Base datamodule for the Tiny Imagenet dataset.

Parameters:
  • datadir (str) – Where to load the data.

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

Video Datamodules

HMDB51

class eztorch.datamodules.Hmdb51DataModule(datadir, train=None, val=None, test=None, split_id=1, video_path_prefix='', decode_audio=False, decoder='pyav', decoder_args={})[source]

Datamodule for the HMDB51 dataset.

Parameters:
  • datadir (str) – Path to the data (eg: csv, folder, …).

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

  • video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.

    Default: ''

  • decode_audio (bool, optional) – If True, decode audio.

    Default: False

  • decoder (str, optional) – Defines which backend should be used to decode videos.

    Default: 'pyav'

  • decoder_args (DictConfig, optional) – Arguments to configure the decoder.

    Default: {}

  • split_id (int, optional) – Split used for training and testing.

    Default: 1

Kinetics

class eztorch.datamodules.Kinetics200DataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decode_audio=False, decoder='pyav', decoder_args={})[source]

Datamodule for the Mini-Kinetics200 dataset.

Parameters:
  • datadir (str) – Path to the data (eg: csv, folder, …).

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

  • video_path_prefix (str, optional) – Path to root directory with the videos that are loaded in LabeledVideoDataset. All the video paths before loading are prefixed with this path.

    Default: ''

  • decode_audio (bool, optional) – If True, decode audio.

    Default: False

  • decoder (str, optional) – Defines which backend should be used to decode videos.

    Default: 'pyav'

  • decoder_args (DictConfig, optional) – Arguments to configure the decoder.

    Default: {}

class eztorch.datamodules.Kinetics400DataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decode_audio=False, decoder='pyav', decoder_args={})[source]

Datamodule for the Kinetics400 datasets.

Parameters:
  • datadir (str) – Path to the data (eg: csv, folder, …).

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

  • video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.

    Default: ''

  • decode_audio (bool, optional) – If True, decode audio.

    Default: False

  • decoder (str, optional) – Defines which backend should be used to decode videos.

    Default: 'pyav'

  • decoder_args (DictConfig, optional) – Arguments to configure the decoder.

    Default: {}

class eztorch.datamodules.Kinetics600DataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decode_audio=False, decoder='pyav', decoder_args={})[source]

Datamodule for the Kinetics600 datasets.

Parameters:
  • datadir (str) – Path to the data (eg: csv, folder, …).

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

  • video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.

    Default: ''

  • decode_audio (bool, optional) – If True, decode audio.

    Default: False

  • decoder (str, optional) – Defines which backend should be used to decode videos.

    Default: 'pyav'

  • decoder_args (DictConfig, optional) – Arguments to configure the decoder.

    Default: {}

class eztorch.datamodules.Kinetics700DataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decode_audio=False, decoder='pyav', decoder_args={})[source]

Datamodule for the Kinetics700 datasets.

Parameters:
  • datadir (str) – Path to the data (eg: csv, folder, …).

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

  • video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.

    Default: ''

  • decode_audio (bool, optional) – If True, decode audio.

    Default: False

  • decoder (str, optional) – Defines which backend should be used to decode videos.

    Default: 'pyav'

  • decoder_args (DictConfig, optional) – Arguments to configure the decoder.

    Default: {}

SoccerNet

class eztorch.datamodules.SoccerNetDataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decoder='frame', decoder_args={})[source]

Base datamodule for the SoccerNet datasets.

Parameters:
  • datadir (str) – Path to the data (eg: csv, folder, …).

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

  • video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.

    Default: ''

  • decoder (str, optional) – Defines which backend should be used to decode videos.

    Default: 'frame'

  • decoder_args (DictConfig, optional) – Arguments to configure the decoder.

    Default: {}

class eztorch.datamodules.ImageSoccerNetDataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decoder='frame', decoder_args={})[source]

Base datamodule for the SoccerNet image datasets.

Parameters:
  • datadir (str) – Path to the data (eg: csv, folder, …).

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

  • video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.

    Default: ''

  • decoder (str, optional) – Defines which backend should be used to decode videos.

    Default: 'frame'

  • decoder_args (DictConfig, optional) – Arguments to configure the decoder.

    Default: {}

Spot

class eztorch.datamodules.SpotDataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decoder='frame', decoder_args={})[source]

Base datamodule for the SoccerNet datasets.

Parameters:
  • datadir (str) – Path to the data (eg: csv, folder, …).

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

  • video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.

    Default: ''

  • decoder (str, optional) – Defines which backend should be used to decode videos.

    Default: 'frame'

  • decoder_args (DictConfig, optional) – Arguments to configure the decoder.

    Default: {}

UCF101

class eztorch.datamodules.Ucf101DataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decode_audio=False, decoder='pyav', decoder_args={}, split_id=1)[source]

Datamodule for the HMDB51 dataset.

Parameters:
  • datadir (str) – Path to the data (eg: csv, folder, …).

  • train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.

    Default: None

  • val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.

    Default: None

  • test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.

    Default: None

  • video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.

    Default: ''

  • decode_audio (bool, optional) – If True, decode audio.

    Default: False

  • decoder (str, optional) – Defines which backend should be used to decode videos.

    Default: 'pyav'

  • decoder_args (DictConfig, optional) – Arguments to configure the decoder.

    Default: {}

  • split_id (int, optional) – Split used for training and testing.

    Default: 1