SCE R3D-18 on Kinetics200¶
In this guide, we provide steps to pretrain a R3D-18 using SCE on Kinetics200.
The first section will focus on defining the configuration and the second one to launch the training.
Define the configuration¶
We first need to define the configuration to train SCE. The configuration is available here, and is detailed below.
Define the Datamodule¶
The datamodule is the Kinetics200DataModule
.
datamodule:
_target_: eztorch.datamodules.Kinetics200DataModule
_recursive_: false
datadir: ${..dir.data}
video_path_prefix: ${.datadir}
For a video datamodule, it needs to be specified the decoder used such as Pyav, frames etc., along with its parameters. For this example, we use the frames decoder.
datamodule:
decoder: frame
decoder_args:
fps: 30
frame_filter:
subsample_type: uniform
num_samples: 8
time_difference_prob: 0.2
num_threads_io: 4
num_threads_decode: 4
decode_float: true
For each set used during the fit of the model, usually training and validation, there needs to be passed information about the clip sampler, the transform applied to each clip and the configuration for the dataloaders.
datamodule:
train:
dataset:
datadir: ${...datadir}/train.csv
video_path_prefix: ${...datadir}/train
transform:
_target_: eztorch.transforms.OnlyInputListTransform
_recursive_: true
transform:
_target_: eztorch.transforms.video.RandomResizedCrop
target_height: 224
target_width: 224
scale:
- 0.2
- 0.766
aspect_ratio:
- 0.75
- 1.3333
interpolation: bilinear
clip_sampler:
_target_: eztorch.datasets.clip_samplers.RandomMultiClipSampler
num_clips: 2
clip_duration: 2.56
speeds:
- 1
jitter_factor: 0
loader:
drop_last: true
num_workers: 5
pin_memory: true
global_batch_size: 512
Define the model¶
We will use the SCEModel
. SCE as a siamese self-supervised learning method defines several networks. It is composed of an online branch updated by backpropagation and a momentum target branch updated by the exponential moving average of the online branch.
The online branch consists of an encoder, or trunk, a projector and a predictor. The target branch has the same architecture as the online one without the predictor.
We first need to tell Hydra which model to instantiate:
model:
_target_: eztorch.models.siamese.SCEModel
_recursive_: false
Each neural network architecture of SCE must also be defined:
A trunk to learn representations such as ResNet3D18
model:
trunk:
_target_: eztorch.models.trunks.create_video_head_model
_recursive_: false
model:
_target_: eztorch.models.trunks.create_resnet3d_basic
head: null
model_depth: 18
head:
_target_: eztorch.models.heads.create_video_resnet_head
activation: null
dropout_rate: 0.0
in_features: 512
num_classes: 0
output_size: [1, 1 ,1]
output_with_global_average: true
pool: null
pool_kernel_size: [8, 7, 7]
A projector, which is a rather small MLP network, to project data in a lower dimensional space invariant to data augmentation:
model:
projector:
_target_: eztorch.models.heads.MLPHead
activation_inplace: true
activation_layer: relu
affine: true
bias: false
dropout: 0.0
dropout_inplace: true
hidden_dims:
- 1024
- 1024
input_dim: 512
norm_layer: bn_1D
num_layers: 3
last_bias: false
last_norm: true
last_affine: false
output_dim: 256
A predictor, smaller than the projector, to predict the output projection of the target encoder
model:
predictor:
_target_: eztorch.models.heads.MLPHead
activation_inplace: true
activation_layer: relu
affine: true
bias: false
dropout: 0.0
dropout_inplace: true
hidden_dims:
- 1024
input_dim: 256
norm_layer: bn_1D
num_layers: 2
last_bias: false
last_norm: false
last_affine: false
output_dim: 256
Now we can provide the configuration to correctly configure the SCE model:
model:
coeff: 0.5
final_scheduler_coeff: 0.0
initial_momentum: 0.99
mutual_pass: false
normalize_outputs: true
num_devices: -1
num_global_crops: 2
num_local_crops: 0
num_splits: 0
num_splits_per_combination: 2
queue:
size: 32768
feature_dim: 256
scheduler_coeff: null
scheduler_momentum: cosine
simulate_n_devices: 8
shuffle_bn: false
start_warmup_coeff: 1.0
sym: true
temp: 0.1
temp_m: 0.05
use_keys: false
warmup_epoch_coeff: 0
warmup_epoch_temp_m: 0
warmup_scheduler_coeff: linear
warmup_scheduler_temp_m: cosine
To optimize the parameters, we also provide the configuration for the optimizer and its scheduler:
model:
optimizer:
_target_: eztorch.optimizers.optimizer_factory
_recursive_: false
exclude_wd_norm: false
exclude_wd_bias: false
name: lars
params:
momentum: 0.9
trust_coefficient: 0.001
weight_decay: 1.0e-06
batch_size: 512
initial_lr: 2.4
layer_decay_lr: null
scaler: linear
scheduler:
_target_: eztorch.schedulers.scheduler_factory
_recursive_: false
name: linear_warmup_cosine_annealing_lr
params:
max_epochs: 200
warmup_epochs: 35
warmup_start_lr: 0.0
eta_min: 0.0
interval: step
SCEModel supports GPU transform to speed up data augmentations for training and/or validation, and we specify the configuration of the contrastive transforms:
model:
train_transform:
_target_: eztorch.transforms.ApplyTransformsOnList
_recursive_: true
transforms:
- _target_: torchaug.batch_transforms.BatchVideoWrapper
same_on_frames: true
video_format: CTHW
inplace: true
transforms:
- _target_: eztorch.transforms.Div255Input
inplace: true
- _target_: torchaug.batch_transforms.BatchRandomColorJitter
brightness: 0.8
contrast: 0.8
hue: 0.2
p: 0.8
saturation: 0.4
inplace: true
- _target_: torchaug.batch_transforms.BatchRandomGrayscale
p: 0.2
inplace: true
- _target_: torchaug.batch_transforms.BatchRandomGaussianBlur
kernel_size: 23
sigma:
- 0.1
- 2.0
p: 1.0
inplace: true
- _target_: torchaug.batch_transforms.BatchRandomHorizontalFlip
p: 0.5
inplace: true
- _target_: torchaug.transforms.Normalize
mean:
- 0.45
- 0.45
- 0.45
std:
- 0.225
- 0.225
- 0.225
inplace: true
- _target_: torchaug.batch_transforms.BatchVideoWrapper
same_on_frames: true
video_format: CTHW
inplace: true
transforms:
- _target_: eztorch.transforms.Div255Input
inplace: true
- _target_: torchaug.batch_transforms.BatchRandomColorJitter
brightness: 0.8
contrast: 0.8
hue: 0.2
p: 0.8
saturation: 0.4
inplace: true
- _target_: torchaug.batch_transforms.BatchRandomGrayscale
p: 0.2
inplace: true
- _target_: torchaug.batch_transforms.BatchRandomGaussianBlur
kernel_size: 23
sigma:
- 0.1
- 2.0
p: 0.1
inplace: true
- _target_: torchaug.batch_transforms.BatchRandomSolarize
p: 0.2
threshold: 0.5
inplace: true
- _target_: torchaug.batch_transforms.BatchRandomHorizontalFlip
p: 0.5
- _target_: torchaug.transforms.Normalize
mean:
- 0.45
- 0.45
- 0.45
std:
- 0.225
- 0.225
- 0.225
inplace: true
Configure the trainer¶
To run our experiment, we need to define a trainer from Pytorch-Lightning.
It allows us to specify the number and type of devices used, configure average mixed precision, and whether to use synchronized batch normalization or not, …:
trainer:
_target_: lightning.pytorch.trainer.Trainer
accelerator: gpu
benchmark: true
devices: -1
max_epochs: 200
num_nodes: 1
precision: 16
strategy:
_target_: lightning.pytorch.strategies.DDPStrategy
find_unused_parameters: false
static_graph: false
sync_batchnorm: true
Also, you should define the callbacks fired by the trainer such as the checkpointing for the model:
callbacks:
model_checkpoint:
_target_: eztorch.callbacks.ModelCheckpoint
dirpath: pretrain_checkpoints
filename: '{epoch}'
save_last: false
save_top_k: -1
mode: min
every_n_epochs: 100
Job configuration¶
Hydra allows you to configure its behavior to define a run directory to store your result, also used by Eztorch to change your pwd
. You can also specify Python packages to retrieve configuration to inherit from or to include in your current config:
hydra:
searchpath:
- pkg://eztorch.configs
run:
dir: ${...dir.run}
You can define the various directories for your experiment:
the root of your experiments
the current experiment
the data
dir:
data: ???
root: /output/
exp: pretrain
run: ${.root}/${.exp}
Finally, Pytorch-Lightning provides a nice tool to define seeds on all packages:
seed:
_target_: lightning.fabric.utilities.seed.seed_everything
seed: 42
workers: true
Launch the pretraining¶
To launch the pretraining of SCE and use our current configuration, you have to call the right Python script with the location of the configuration.
Eztorch defines a pretrain script
that provides you the script to launch pretraining depending on your hydra configuration.
The script to launch the experiments using SLURM is the following:
output_dir=... # The folder at the root of your experiment
dataset_dir=... # The folder containing the data
cd sce/run
config_path="../doc/examples/configs/"
config_name="pretrain_r3d18_SCE_kinetics200"
seed=42
srun --kill-on-bad-exit=1 python pretrain.py\
-cp $config_path -cn $config_name\
dir.data=$dataset_dir dir.root=$output_dir \
dir.exp='pretrain' seed.seed=$seed \
datamodule.train.loader.num_workers=3 \
datamodule.val.loader.num_workers=3 \
trainer.gpus=-1
Pytorch-lightning automatically detects we are using SLURM and through the srun command make the multi-GPU distributed training work.
As you can see, we provided the relative path to the configuration as well as its name to configure hydra with argparse-like arguments.
Configuration for our experiment is accessible the same way as in our Python code.