Datamodules¶

Datamodules are tools from Pytorch-lightning that allows to wrap all the logic to download datasets, verify its integrity, make transform, load the datasets, and make the dataloaders.

Eztorch contains basic wrapper to contain the logic of Datamodules to work with Hydra.

Several Datamodules already exist to handle various datasets.

Base Datamodules¶

Base¶

class eztorch.datamodules.BaseDataModule(datadir, train=None, val=None, test=None)[source]¶

Abstract class that inherits from LightningDataModule to follow standardized preprocessing for all datamodules in eztorch.

Parameters:

datadir (str) – Where to save/load the data.
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
data (val Configuration for the validation data to define the loading of)
dataloader. (the transforms and the)
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None

Warning

The loader subconfigurations must not contain ‘batch_size’ that is automatically computed from the ‘global_batch_size’ specified in the configuration.

class eztorch.datamodules.FolderDataModule(datadir, train=None, val=None, test=None)[source]¶

Base datamodule for folder datasets.

Parameters:

datadir (str) – Where to save/load the data.
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None

class eztorch.datamodules.DumbDataModule(train=None, val=None, test=None, num_classes=10)[source]¶

Dumb data module for testing models with random data.

Parameters:

train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data and the dataloader.
Default: None

Video¶

class eztorch.datamodules.VideoBaseDataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decode_audio=False, decoder='pyav', decoder_args={})[source]¶

Abstract class that inherits from BaseDataModule to follow standardized preprocessing for video datamodules.

Parameters:

datadir (str) – Path to the data (eg: csv, folder, …).
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
data (val Configuration for the validation data to define the loading of)
dataloader. (the transforms and the)
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None
video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.
Default: ''
decode_audio (bool, optional) – If True, decode audio.
Default: False
decoder (str, optional) – Defines which backend should be used to decode videos by default.
Default: 'pyav'
decoder_args (DictConfig, optional) – Arguments to configure the default decoder.
Default: {}

Warning

The loader subconfigurations must not contain ‘batch_size’ that is automatically computed from the ‘global_batch_size’ specified in the configuration.

Image Datamodules¶

CIFAR¶

class eztorch.datamodules.CIFAR10DataModule(datadir, train=None, val=None, test=None, num_classes_kept=None, split_train_ratio=None, seed_for_split=42)[source]¶

Datamodule for the CIFAR10 dataset.

Parameters:

datadir (str) – Where to save/load the data.
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None
num_classes_kept (Optional[int], optional) – Number of classes to use.
Default: None
split_train_ratio (Optional[float], optional) – If not None randomly split the train dataset in two with split_train_ration ratio for train.
Default: None
seed_for_split (int, optional) – Seed for the split.
Default: 42

Example:

datamodule = CIFAR10DataModule(datadir)

class eztorch.datamodules.CIFAR100DataModule(datadir, train=None, val=None, test=None, num_classes_kept=None, split_train_ratio=None, seed_for_split=42)[source]¶

Datamodule for the CIFAR100 dataset.

Parameters:

datadir (str) – Where to save/load the data.
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None
num_classes_kept (Optional[int], optional) – Number of classes to use.
Default: None
split_train_ratio (Optional[float], optional) – If not None randomly split the train dataset in two with split_train_ration ratio for train.
Default: None
seed_for_split (int, optional) – Seed for the split.
Default: 42

Example:

datamodule = CIFAR100DataModule(datadir)

ImageNet¶

class eztorch.datamodules.ImagenetDataModule(datadir, train=None, val=None, test=None)[source]¶

Base datamodule for the Imagenet dataset.

Parameters:

datadir (str) – Where to load the data.
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None

class eztorch.datamodules.Imagenet100DataModule(datadir, train=None, val=None, test=None)[source]¶

Base datamodule for the Imagenet100 dataset.

Parameters:

datadir (str) – Where to load the data.
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None

Example:

datamodule = Imagenet100DataModule(datadir)

STL10¶

class eztorch.datamodules.STL10DataModule(datadir, train=None, val=None, test=None, folds=None, training_split='unlabeled')[source]¶

Datamodule for the STL10 dataset in SSL setting.

Parameters:

datadir (str) – Where to save/load the data.
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None
folds (Optional[int], optional) – One of \({0-9}\) or None. For training, loads one of the 10 pre-defined folds of 1k samples for the standard evaluation procedure. If no value is passed, loads the 5k samples.
Default: None
training_split (str, optional) – Split used for the training dataset.
Default: 'unlabeled'

Tiny-ImageNet¶

class eztorch.datamodules.TinyImagenetDataModule(datadir, train=None, val=None, test=None)[source]¶

Base datamodule for the Tiny Imagenet dataset.

Parameters:

datadir (str) – Where to load the data.
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None

Video Datamodules¶

HMDB51¶

class eztorch.datamodules.Hmdb51DataModule(datadir, train=None, val=None, test=None, split_id=1, video_path_prefix='', decode_audio=False, decoder='pyav', decoder_args={})[source]¶

Datamodule for the HMDB51 dataset.

Parameters:

datadir (str) – Path to the data (eg: csv, folder, …).
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None
video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.
Default: ''
decode_audio (bool, optional) – If True, decode audio.
Default: False
decoder (str, optional) – Defines which backend should be used to decode videos.
Default: 'pyav'
decoder_args (DictConfig, optional) – Arguments to configure the decoder.
Default: {}
split_id (int, optional) – Split used for training and testing.
Default: 1

Kinetics¶

class eztorch.datamodules.Kinetics200DataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decode_audio=False, decoder='pyav', decoder_args={})[source]¶

Datamodule for the Mini-Kinetics200 dataset.

Parameters:

datadir (str) – Path to the data (eg: csv, folder, …).
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None
video_path_prefix (str, optional) – Path to root directory with the videos that are loaded in LabeledVideoDataset. All the video paths before loading are prefixed with this path.
Default: ''
decode_audio (bool, optional) – If True, decode audio.
Default: False
decoder (str, optional) – Defines which backend should be used to decode videos.
Default: 'pyav'
decoder_args (DictConfig, optional) – Arguments to configure the decoder.
Default: {}

class eztorch.datamodules.Kinetics400DataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decode_audio=False, decoder='pyav', decoder_args={})[source]¶

Datamodule for the Kinetics400 datasets.

Parameters:

datadir (str) – Path to the data (eg: csv, folder, …).
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None
video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.
Default: ''
decode_audio (bool, optional) – If True, decode audio.
Default: False
decoder (str, optional) – Defines which backend should be used to decode videos.
Default: 'pyav'
decoder_args (DictConfig, optional) – Arguments to configure the decoder.
Default: {}

class eztorch.datamodules.Kinetics600DataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decode_audio=False, decoder='pyav', decoder_args={})[source]¶

Datamodule for the Kinetics600 datasets.

Parameters:

datadir (str) – Path to the data (eg: csv, folder, …).
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None
video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.
Default: ''
decode_audio (bool, optional) – If True, decode audio.
Default: False
decoder (str, optional) – Defines which backend should be used to decode videos.
Default: 'pyav'
decoder_args (DictConfig, optional) – Arguments to configure the decoder.
Default: {}

class eztorch.datamodules.Kinetics700DataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decode_audio=False, decoder='pyav', decoder_args={})[source]¶

Datamodule for the Kinetics700 datasets.

Parameters:

datadir (str) – Path to the data (eg: csv, folder, …).
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None
video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.
Default: ''
decode_audio (bool, optional) – If True, decode audio.
Default: False
decoder (str, optional) – Defines which backend should be used to decode videos.
Default: 'pyav'
decoder_args (DictConfig, optional) – Arguments to configure the decoder.
Default: {}

SoccerNet¶

class eztorch.datamodules.SoccerNetDataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decoder='frame', decoder_args={})[source]¶

Base datamodule for the SoccerNet datasets.

Parameters:

datadir (str) – Path to the data (eg: csv, folder, …).
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None
video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.
Default: ''
decoder (str, optional) – Defines which backend should be used to decode videos.
Default: 'frame'
decoder_args (DictConfig, optional) – Arguments to configure the decoder.
Default: {}

class eztorch.datamodules.ImageSoccerNetDataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decoder='frame', decoder_args={})[source]¶

Base datamodule for the SoccerNet image datasets.

Parameters:

datadir (str) – Path to the data (eg: csv, folder, …).
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None
video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.
Default: ''
decoder (str, optional) – Defines which backend should be used to decode videos.
Default: 'frame'
decoder_args (DictConfig, optional) – Arguments to configure the decoder.
Default: {}

Spot¶

class eztorch.datamodules.SpotDataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decoder='frame', decoder_args={})[source]¶

Base datamodule for the SoccerNet datasets.

Parameters:

datadir (str) – Path to the data (eg: csv, folder, …).
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None
video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.
Default: ''
decoder (str, optional) – Defines which backend should be used to decode videos.
Default: 'frame'
decoder_args (DictConfig, optional) – Arguments to configure the decoder.
Default: {}

UCF101¶

class eztorch.datamodules.Ucf101DataModule(datadir, train=None, val=None, test=None, video_path_prefix='', decode_audio=False, decoder='pyav', decoder_args={}, split_id=1)[source]¶

Datamodule for the HMDB51 dataset.

Parameters:

datadir (str) – Path to the data (eg: csv, folder, …).
train (Optional[DictConfig], optional) – Configuration for the training data to define the loading of data, the transforms and the dataloader.
Default: None
val (Optional[DictConfig], optional) – Configuration for the validation data to define the loading of data, the transforms and the dataloader.
Default: None
test (Optional[DictConfig], optional) – Configuration for the testing data to define the loading of data, the transforms and the dataloader.
Default: None
video_path_prefix (str, optional) – Path to root directory where the videos are stored. All the video paths before loading are prefixed with this path.
Default: ''
decode_audio (bool, optional) – If True, decode audio.
Default: False
decoder (str, optional) – Defines which backend should be used to decode videos.
Default: 'pyav'
decoder_args (DictConfig, optional) – Arguments to configure the decoder.
Default: {}
split_id (int, optional) – Split used for training and testing.
Default: 1