Siamese

Base classes

class eztorch.models.siamese.SiameseBaseModel(trunk, optimizer, projector=None, predictor=None, train_transform=None, val_transform=None, test_transform=None, normalize_outputs=True, num_global_crops=2, num_local_crops=0, num_splits=0, num_splits_per_combination=2, mutual_pass=False)[source]

Abstract class to represent siamese models.

Subclasses should implement training_step method.

Parameters:
  • trunk (DictConfig) – Config to build a trunk.

  • optimizer (DictConfig) – Config to build optimizers and schedulers.

  • projector (Optional[DictConfig], optional) – Config to build a project.

    Default: None

  • predictor (Optional[DictConfig], optional) – Config to build a predictor.

    Default: None

  • train_transform (Optional[DictConfig], optional) – Config to perform transformation on train input.

    Default: None

  • val_transform (Optional[DictConfig], optional) – Config to perform transformation on val input.

    Default: None

  • test_transform (Optional[DictConfig], optional) – Config to perform transformation on test input.

    Default: None

  • normalize_outputs (bool, optional) – If True, normalize outputs.

    Default: True

  • num_global_crops (int, optional) – Number of global crops which are the first elements

    Default: 2

  • batch. (of each)

  • num_local_crops (int, optional) – Number of local crops which are the last elements

    Default: 0

  • batch.

  • num_splits (int, optional) – Number of splits to apply to each crops.

    Default: 0

  • num_splits_per_combination (int, optional) – Number of splits used for combinations of features of each split.

    Default: 2

  • mutual_pass (bool, optional) – If True, perform one pass per crop resolution.

    Default: False

class eztorch.models.siamese.MomentumSiameseBaseModel(trunk, optimizer, projector=None, predictor=None, train_transform=None, val_transform=None, test_transform=None, normalize_outputs=True, num_global_crops=2, num_local_crops=0, num_splits=0, num_splits_per_combination=2, mutual_pass=False, initial_momentum=0.996, scheduler_momentum='cosine')[source]

Abstract class to represent siamese models with a momentum branch.

Subclasses should implement training_step method.

Parameters:
  • trunk (DictConfig) – Config to build a trunk.

  • optimizer (DictConfig) – Config to build optimizers and schedulers.

  • projector (Optional[DictConfig], optional) – Config to build a project.

    Default: None

  • predictor (Optional[DictConfig], optional) – Config to build a predictor.

    Default: None

  • train_transform (Optional[DictConfig], optional) – Config to perform transformation on train input.

    Default: None

  • val_transform (Optional[DictConfig], optional) – Config to perform transformation on val input.

    Default: None

  • test_transform (Optional[DictConfig], optional) – Config to perform transformation on test input.

    Default: None

  • normalize_outputs (bool, optional) – If True, normalize outputs.

    Default: True

  • num_global_crops (int, optional) – Number of global crops which are the first elements of each batch.

    Default: 2

  • num_local_crops (int, optional) – Number of local crops which are the last elements of each batch.

    Default: 0

  • num_splits (int, optional) – Number of splits to apply to each crops.

    Default: 0

  • num_splits_per_combination (int, optional) – Number of splits used for combinations of features of each split.

    Default: 2

  • mutual_pass (bool, optional) – If True, perform one pass per branch per crop resolution.

    Default: False

  • initial_momentum (int, optional) – Initial value for the momentum update.

    Default: 0.996

  • scheduler_momentum (str, optional) – Rule to update the momentum value.

    Default: 'cosine'

class eztorch.models.siamese.ShuffleMomentumSiameseBaseModel(trunk, optimizer, projector=None, predictor=None, train_transform=None, val_transform=None, test_transform=None, normalize_outputs=True, num_global_crops=2, num_local_crops=0, num_splits=0, num_splits_per_combination=2, mutual_pass=False, initial_momentum=0.996, scheduler_momentum='cosine', shuffle_bn=True, num_devices=1, simulate_n_devices=8)[source]

Abstract class to represent siamese models with a momentum branch and possibility to shuffle input elements in momentum branch to apply normalization trick from MoCo.

Subclasses should implement training_step method.

Parameters:
  • trunk (DictConfig) – Config to build a trunk.

  • optimizer (DictConfig) – Config to build optimizers and schedulers.

  • projector (Optional[DictConfig], optional) – Config to build a project.

    Default: None

  • predictor (Optional[DictConfig], optional) – Config to build a predictor.

    Default: None

  • train_transform (Optional[DictConfig], optional) – Config to perform transformation on train input.

    Default: None

  • val_transform (Optional[DictConfig], optional) – Config to perform transformation on val input.

    Default: None

  • test_transform (Optional[DictConfig], optional) – Config to perform transformation on test input.

    Default: None

  • normalize_outputs (bool, optional) – If True, normalize outputs.

    Default: True

  • num_global_crops (int, optional) – Number of global crops which are the first elements of each batch.

    Default: 2

  • num_local_crops (int, optional) – Number of local crops which are the last elements of each batch.

    Default: 0

  • num_splits (int, optional) – Number of splits to apply to each crops.

    Default: 0

  • num_splits_per_combination (int, optional) – Number of splits used for combinations of features of each split.

    Default: 2

  • mutual_pass (bool, optional) – If True, perform one pass per branch per crop resolution.

    Default: False

  • initial_momentum (int, optional) – Initial value for the momentum update.

    Default: 0.996

  • scheduler_momentum (str, optional) – Rule to update the momentum value.

    Default: 'cosine'

  • shuffle_bn (bool, optional) – If True, apply shuffle normalization trick from MoCo.

    Default: True

  • num_devices (int, optional) – Number of devices used to train the model in each node.

    Default: 1

  • simulate_n_devices (int, optional) – Number of devices to simulate to apply shuffle trick. Requires shuffle_bn to be True and num_devices to be \(1\).

    Default: 8

class eztorch.models.siamese.ShuffleMomentumQueueBaseModel(trunk, optimizer, projector=None, predictor=None, train_transform=None, val_transform=None, test_transform=None, normalize_outputs=True, num_global_crops=2, num_local_crops=0, num_splits=0, num_splits_per_combination=2, mutual_pass=False, initial_momentum=0.999, scheduler_momentum='constant', shuffle_bn=False, num_devices=1, simulate_n_devices=8, queue=None, sym=False, use_keys=False)[source]

SCE model.

Parameters:
  • trunk (DictConfig) – Config tu build a trunk.

  • optimizer (DictConfig) – Config tu build optimizers and schedulers.

  • projector (Optional[DictConfig], optional) – Config to build a project.

    Default: None

  • predictor (Optional[DictConfig], optional) – Config to build a predictor.

    Default: None

  • train_transform (Optional[DictConfig], optional) – Config to perform transformation on train input.

    Default: None

  • val_transform (Optional[DictConfig], optional) – Config to perform transformation on val input.

    Default: None

  • test_transform (Optional[DictConfig], optional) – Config to perform transformation on test input.

    Default: None

  • normalize_outputs (bool, optional) – If True, normalize outputs.

    Default: True

  • num_global_crops (int, optional) – Number of global crops which are the first elements of each batch.

    Default: 2

  • num_local_crops (int, optional) – Number of local crops which are the last elements of each batch.

    Default: 0

  • num_splits (int, optional) – Number of splits to apply to each crops.

    Default: 0

  • num_splits_per_combination (int, optional) – Number of splits used for combinations of features of each split.

    Default: 2

  • mutual_pass (bool, optional) – If True, perform one pass per branch per crop resolution.

    Default: False

  • initial_momentum (int, optional) – Initial value for the momentum update.

    Default: 0.999

  • scheduler_momentum (str, optional) – Rule to update the momentum value.

    Default: 'constant'

  • shuffle_bn (bool, optional) – If True, apply shuffle normalization trick from MoCo.

    Default: False

  • num_devices (int, optional) – Number of devices used to train the model in each node.

    Default: 1

  • simulate_n_devices (int, optional) – Number of devices to simulate to apply shuffle trick. Requires shuffle_bn to be True and num_devices to be \(1\).

    Default: 8

  • queue (Optional[DictConfig], optional) – Config to build a queue.

    Default: None

  • sym (bool, optional) – If True, symmetrised the loss.

    Default: False

  • use_keys (bool, optional) – If True, add keys to negatives.

    Default: False

Self-Supervised Models

SimCLR

class eztorch.models.siamese.SimCLRModel(trunk, optimizer, projector=None, train_transform=None, val_transform=None, test_transform=None, normalize_outputs=True, num_global_crops=2, num_local_crops=0, num_splits=0, num_splits_per_combination=2, mutual_pass=False, temp=0.1)[source]

SimCLR model with version 1, 2 that can be configured.

References

Parameters:
  • trunk (DictConfig) – Config tu build a trunk.

  • optimizer (DictConfig) – Config tu build optimizers and schedulers.

  • projector (Optional[DictConfig], optional) – Config to build a project.

    Default: None

  • train_transform (Optional[DictConfig], optional) – Config to perform transformation on train input.

    Default: None

  • val_transform (Optional[DictConfig], optional) – Config to perform transformation on val input.

    Default: None

  • test_transform (Optional[DictConfig], optional) – Config to perform transformation on test input.

    Default: None

  • normalize_outputs (bool, optional) – If True, normalize outputs.

    Default: True

  • num_global_crops (int, optional) – Number of global crops which are the first elements of each batch.

    Default: 2

  • num_local_crops (int, optional) – Number of local crops which are the last elements of each batch.

    Default: 0

  • num_splits (int, optional) – Number of splits to apply to each crops.

    Default: 0

  • num_splits_per_combination (int, optional) – Number of splits used for combinations of features of each split.

    Default: 2

  • mutual_pass (bool, optional) – If True, perform one pass per crop resolution.

    Default: False

  • temp (float, optional) – Temperature parameter to scale the online similarities.

    Default: 0.1

MoCo

class eztorch.models.siamese.MoCoModel(trunk, optimizer, projector=None, predictor=None, train_transform=None, val_transform=None, test_transform=None, normalize_outputs=True, num_global_crops=2, num_local_crops=0, num_splits=0, num_splits_per_combination=2, mutual_pass=False, initial_momentum=0.999, scheduler_momentum='constant', shuffle_bn=True, num_devices=1, simulate_n_devices=8, queue=None, sym=False, use_keys=False, temp=0.2)[source]

MoCo model with version 1, 2, 2+, 3 that can be configured.

References

Parameters:
  • trunk (DictConfig) – Config to build a trunk.

  • optimizer (DictConfig) – Config to build optimizers and schedulers.

  • projector (Optional[DictConfig], optional) – Config to build a project.

    Default: None

  • predictor (Optional[DictConfig], optional) – Config to build a predictor.

    Default: None

  • train_transform (Optional[DictConfig], optional) – Config to perform transformation on train input.

    Default: None

  • val_transform (Optional[DictConfig], optional) – Config to perform transformation on val input.

    Default: None

  • test_transform (Optional[DictConfig], optional) – Config to perform transformation on test input.

    Default: None

  • normalize_outputs (bool, optional) – If True, normalize outputs.

    Default: True

  • num_global_crops (int, optional) – Number of global crops which are the first elements of each batch.

    Default: 2

  • num_local_crops (int, optional) – Number of local crops which are the last elements of each batch.

    Default: 0

  • num_splits (int, optional) – Number of splits to apply to each crops.

    Default: 0

  • num_splits_per_combination (int, optional) – Number of splits used for combinations of features of each split.

    Default: 2

  • mutual_pass (bool, optional) – If True, perform one pass per branch per crop resolution.

    Default: False

  • initial_momentum (int, optional) – initial value for the momentum update.

    Default: 0.999

  • scheduler_momentum (str, optional) – rule to update the momentum value.

    Default: 'constant'

  • shuffle_bn (bool, optional) – If True, apply shuffle normalization trick from MoCo.

    Default: True

  • num_devices (int, optional) – Number of devices used to train the model in each node.

    Default: 1

  • simulate_n_devices (int, optional) – Number of devices to simulate to apply shuffle trick. Requires shuffle_bn to be True and num_devices to be \(1\).

    Default: 8

  • queue (Optional[DictConfig], optional) – Config to build a queue.

    Default: None

  • sym (bool, optional) – If True, symmetrised the loss.

    Default: False

  • use_keys (bool, optional) – If True, add keys to negatives.

    Default: False

  • temp (float, optional) – Temperature parameter to scale the online similarities.

    Default: 0.2

MoCo v3

class eztorch.models.siamese.MoCov3Model(trunk, optimizer, projector=None, predictor=None, train_transform=None, val_transform=None, test_transform=None, normalize_outputs=True, num_global_crops=2, num_local_crops=0, num_splits=0, num_splits_per_combination=2, mutual_pass=False, initial_momentum=0.99, scheduler_momentum='cosine', temp=1.0)[source]

MoCov3 that can be configured as in the paper.

References

Parameters:
  • trunk (DictConfig) – Config to build a trunk.

  • optimizer (DictConfig) – Config to build optimizers and schedulers.

  • projector (Optional[DictConfig], optional) – Config to build a project.

    Default: None

  • predictor (Optional[DictConfig], optional) – Config to build a predictor.

    Default: None

  • train_transform (Optional[DictConfig], optional) – Config to perform transformation on train input.

    Default: None

  • val_transform (Optional[DictConfig], optional) – Config to perform transformation on val input.

    Default: None

  • test_transform (Optional[DictConfig], optional) – Config to perform transformation on test input.

    Default: None

  • normalize_outputs (bool, optional) – If True, normalize outputs.

    Default: True

  • num_global_crops (int, optional) – Number of global crops which are the first elements of each batch.

    Default: 2

  • num_local_crops (int, optional) – Number of local crops which are the last elements of each batch.

    Default: 0

  • num_splits (int, optional) – Number of splits to apply to each crops.

    Default: 0

  • num_splits_per_combination (int, optional) – Number of splits used for combinations of features of each split.

    Default: 2

  • mutual_pass (bool, optional) – If True, perform one pass per branch per crop resolution.

    Default: False

  • initial_momentum (int, optional) – Initial value for the momentum update.

    Default: 0.99

  • scheduler_momentum (str, optional) – Rule to update the momentum value.

    Default: 'cosine'

  • temp (float, optional) – Temperature parameter to scale the online similarities.

    Default: 1.0

ReSSL

class eztorch.models.siamese.ReSSLModel(trunk, optimizer, projector=None, predictor=None, train_transform=None, val_transform=None, test_transform=None, normalize_outputs=True, num_global_crops=2, num_local_crops=0, num_splits=0, num_splits_per_combination=2, mutual_pass=False, initial_momentum=0.999, scheduler_momentum='constant', shuffle_bn=True, num_devices=1, simulate_n_devices=8, queue=None, sym=False, use_keys=False, temp=0.1, temp_m=0.04, initial_temp_m=0.04)[source]

ReSSL model.

References

Parameters:
  • trunk (DictConfig) – Config tu build a trunk.

  • optimizer (DictConfig) – Config tu build optimizers and schedulers.

  • projector (Optional[DictConfig], optional) – Config to build a project.

    Default: None

  • predictor (Optional[DictConfig], optional) – Config to build a predictor.

    Default: None

  • train_transform (Optional[DictConfig], optional) – Config to perform transformation on train input.

    Default: None

  • val_transform (Optional[DictConfig], optional) – Config to perform transformation on val input.

    Default: None

  • test_transform (Optional[DictConfig], optional) – Config to perform transformation on test input.

    Default: None

  • normalize_outputs (bool, optional) – If True, normalize outputs.

    Default: True

  • num_global_crops (int, optional) – Number of global crops which are the first elements of each batch.

    Default: 2

  • num_local_crops (int, optional) – Number of local crops which are the last elements of each batch.

    Default: 0

  • num_splits (int, optional) – Number of splits to apply to each crops.

    Default: 0

  • num_splits_per_combination (int, optional) – Number of splits used for combinations of features of each split.

    Default: 2

  • mutual_pass (bool, optional) – If True, perform one pass per branch per crop resolution.

    Default: False

  • initial_momentum (int, optional) – Initial value for the momentum update.

    Default: 0.999

  • scheduler_momentum (str, optional) – Rule to update the momentum value.

    Default: 'constant'

  • shuffle_bn (bool, optional) – If True, apply shuffle normalization trick from MoCo.

    Default: True

  • num_devices (int, optional) – Number of devices used to train the model in each node.

    Default: 1

  • simulate_n_devices (int, optional) – Number of devices to simulate to apply shuffle trick. Requires shuffle_bn to be True and num_devices to be \(1\).

    Default: 8

  • queue (Optional[DictConfig], optional) – Config to build a queue.

    Default: None

  • sym (bool, optional) – If True, symmetrised the loss.

    Default: False

  • use_keys (bool, optional) – If True, add keys to negatives.

    Default: False

  • temp (float, optional) – Temperature parameter to scale the online similarities.

    Default: 0.1

  • temp_m (float, optional) – Temperature parameter to scale the target similarities. Initial value if warmup applied.

    Default: 0.04

SCE

class eztorch.models.siamese.SCEModel(trunk, optimizer, projector=None, predictor=None, train_transform=None, val_transform=None, test_transform=None, normalize_outputs=True, num_global_crops=2, num_local_crops=0, num_splits=0, num_splits_per_combination=2, mutual_pass=False, initial_momentum=0.999, scheduler_momentum='constant', shuffle_bn=True, num_devices=1, simulate_n_devices=8, queue=None, sym=False, use_keys=False, temp=0.1, temp_m=0.05, start_warmup_temp_m=0.05, warmup_epoch_temp_m=0, warmup_scheduler_temp_m='cosine', coeff=0.5, warmup_scheduler_coeff='linear', warmup_epoch_coeff=0, start_warmup_coeff=1.0, scheduler_coeff=None, final_scheduler_coeff=0.0)[source]

SCE model.

References

Parameters:
  • trunk (DictConfig) – Config tu build a trunk.

  • optimizer (DictConfig) – Config tu build optimizers and schedulers.

  • projector (Optional[DictConfig], optional) – Config to build a project.

    Default: None

  • predictor (Optional[DictConfig], optional) – Config to build a predictor.

    Default: None

  • train_transform (Optional[DictConfig], optional) – Config to perform transformation on train input.

    Default: None

  • val_transform (Optional[DictConfig], optional) – Config to perform transformation on val input.

    Default: None

  • test_transform (Optional[DictConfig], optional) – Config to perform transformation on test input.

    Default: None

  • normalize_outputs (bool, optional) – If True, normalize outputs.

    Default: True

  • num_global_crops (int, optional) – Number of global crops which are the first elements of each batch.

    Default: 2

  • num_local_crops (int, optional) – Number of local crops which are the last elements of each batch.

    Default: 0

  • num_splits (int, optional) – Number of splits to apply to each crops.

    Default: 0

  • num_splits_per_combination (int, optional) – Number of splits used for combinations of features of each split.

    Default: 2

  • mutual_pass (bool, optional) – If True, perform one pass per branch per crop resolution.

    Default: False

  • initial_momentum (int, optional) – initial value for the momentum update.

    Default: 0.999

  • scheduler_momentum (str, optional) – rule to update the momentum value.

    Default: 'constant'

  • shuffle_bn (bool, optional) – If True, apply shuffle normalization trick from MoCo.

    Default: True

  • num_devices (int, optional) – Number of devices used to train the model in each node.

    Default: 1

  • simulate_n_devices (int, optional) – Number of devices to simulate to apply shuffle trick. Requires shuffle_bn to be True and num_devices to be \(1\).

    Default: 8

  • queue (Optional[DictConfig], optional) – Config to build a queue.

    Default: None

  • sym (bool, optional) – If True, symmetrised the loss.

    Default: False

  • use_keys (bool, optional) – If True, add keys to negatives.

    Default: False

  • temp (float, optional) – Temperature parameter to scale the online similarities.

    Default: 0.1

  • temp_m (float, optional) – Temperature parameter to scale the target similarities. Initial value if warmup applied.

    Default: 0.05

  • start_warmup_temp_m (float, optional) – Initial temperature parameter to scale the target similarities in case of warmup.

    Default: 0.05

  • warmup_epoch_temp_m (int, optional) – Number of warmup epochs for the target temperature.

    Default: 0

  • warmup_scheduler_temp_m (Optional[int], optional) – Type of scheduler for warming up the target temperature. Options are: 'linear', 'cosine'.

    Default: 'cosine'

  • coeff (float, optional) – Coeff parameter between InfoNCE and relational aspects.

    Default: 0.5

  • warmup_scheduler_coeff (Optional[int], optional) – Type of scheduler for warming up the coefficient. Options are: 'linear', 'cosine'.

    Default: 'linear'

  • warmup_epoch_coeff (int, optional) – Number of warmup epochs for coefficient.

    Default: 0

  • start_warmup_coeff (float, optional) – Starting value of coefficient for warmup.

    Default: 1.0

  • scheduler_coeff (Optional[str], optional) – Type of scheduler for coefficient after warmup. Options are: 'linear', 'cosine'.

    Default: None

  • final_scheduler_coeff (float, optional) – Final value of scheduler coefficient.

    Default: 0.0

SCE Tokens

class eztorch.models.siamese.SCETokensModel(trunk, optimizer, projector=None, predictor=None, train_transform=None, val_transform=None, test_transform=None, normalize_outputs=True, num_global_crops=2, num_local_crops=0, mutual_pass=False, initial_momentum=0.999, scheduler_momentum='constant', queue={}, sym=False, use_keys=True, use_all_keys=False, num_prefix_tokens=0, num_out_tokens=32, positive_radius=0, keep_aligned_positive=True, temp=0.1, temp_m=0.05, start_warmup_temp_m=0.05, warmup_epoch_temp_m=0, warmup_scheduler_temp_m='cosine', coeff=0.5, normalize_positive_coeff=False, warmup_scheduler_coeff='linear', warmup_epoch_coeff=0, start_warmup_coeff=1, scheduler_coeff=None, final_scheduler_coeff=0)[source]

SCE model for tokens output.

References

Parameters:
  • trunk (DictConfig) – Config tu build a trunk.

  • optimizer (DictConfig) – Config tu build optimizers and schedulers.

  • projector (Optional[DictConfig], optional) – Config to build a project.

    Default: None

  • predictor (Optional[DictConfig], optional) – Config to build a predictor.

    Default: None

  • train_transform (Optional[DictConfig], optional) – Config to perform transformation on train input.

    Default: None

  • val_transform (Optional[DictConfig], optional) – Config to perform transformation on val input.

    Default: None

  • test_transform (Optional[DictConfig], optional) – Config to perform transformation on test input.

    Default: None

  • normalize_outputs (bool, optional) – If True, normalize outputs.

    Default: True

  • num_global_crops (int, optional) – Number of global crops which are the first elements of each batch.

    Default: 2

  • num_local_crops (int, optional) – Number of local crops which are the last elements of each batch.

    Default: 0

  • mutual_pass (bool, optional) – If True, perform one pass per branch per crop resolution.

    Default: False

  • initial_momentum (int, optional) – initial value for the momentum update.

    Default: 0.999

  • scheduler_momentum (str, optional) – rule to update the momentum value.

    Default: 'constant'

  • queue (DictConfig, optional) – Config to build a queue.

    Default: {}

  • sym (bool, optional) – If True, symmetrised the loss.

    Default: False

  • use_keys (bool, optional) – If True, add aligned keys to negatives.

    Default: True

  • use_all_keys (bool, optional) – If True, add all keys to negatives.

    Default: False

  • num_out_tokens (int, optional) – Number of expected output tokens.

    Default: 32

  • positive_radius (int, optional) – Number of adjacent tokens to consider as positives.

    Default: 0

  • keep_aligned_positive (bool, optional) – Whether to keep the aligned token as positive.

    Default: True

  • temp (float, optional) – Temperature parameter to scale the online similarities.

    Default: 0.1

  • temp_m (float, optional) – Temperature parameter to scale the target similarities. Initial value if warmup applied.

    Default: 0.05

  • start_warmup_temp_m (float, optional) – Initial temperature parameter to scale the target similarities in case of warmup.

    Default: 0.05

  • warmup_epoch_temp_m (int, optional) – Number of warmup epochs for the target temperature.

    Default: 0

  • warmup_scheduler_temp_m (Optional[int], optional) – Type of scheduler for warming up the target temperature. Options are: 'linear', 'cosine'.

    Default: 'cosine'

  • coeff (float, optional) – Coeff parameter between InfoNCE and relational aspects.

    Default: 0.5

  • normalize_positive_coeff (bool, optional) – Whether to use the coeff argument or multiply it by normalized mask over number of positives.

    Default: False

  • warmup_scheduler_coeff (Optional[int], optional) – Type of scheduler for warming up the coefficient. Options are: 'linear', 'cosine'.

    Default: 'linear'

  • warmup_epoch_coeff (int, optional) – Number of warmup epochs for coefficient.

    Default: 0

  • start_warmup_coeff (float, optional) – Starting value of coefficient for warmup.

    Default: 1

  • scheduler_coeff (Optional[str], optional) – Type of scheduler for coefficient after warmup. Options are: 'linear', 'cosine'.

    Default: None

  • final_scheduler_coeff (float, optional) – Final value of scheduler coefficient.

    Default: 0