Optimizers

Factories

eztorch.optimizers.optimizer_factory(name, initial_lr, model, batch_size=None, num_steps_per_epoch=None, layer_decay_lr=None, keys_without_decay=[], exclude_wd_norm=False, exclude_wd_bias=False, scaler=None, params={}, divide_wd_by_lr=False, scheduler=None, multiply_lr=1.0, multiply_parameters=[])[source]

Optimizer factory to build optimizers and optionally an attached scheduler.

Parameters:
  • name (str) – Name of the scheduler to retrieve the optimizer constructor from _OPTIMIZERS dict.

  • initial_lr (float) – Initial learning rate.

  • model (Module) – Model to optimize.

  • batch_size (Optional[int], optional) – Batch size for the input of the model.

    Default: None

  • num_steps_per_epoch (Optional[int], optional) – Number of steps per epoch. Useful for some schedulers.

    Default: None

  • keys_without_decay (List[str], optional) – Keys to filter parameters for weight decay.

    Default: []

  • exclude_wd_norm (bool, optional) – If True, exclude normalization layers to be regularized by weight decay.

    Default: False

  • exclude_wd_bias (bool, optional) – If True, exclude bias layers to be regularized by weight decay.

    Default: False

  • scaler (Optional[str], optional) – Scaler rule for the initial learning rate.

    Default: None

  • params (DictConfig, optional) – Parameters for the optimizer constructor.

    Default: {}

  • divide_wd_by_lr (bool, optional) – If True, divide the weight decay by the value of the learning rate.

    Default: False

  • scheduler (Optional[DictConfig], optional) – Scheduler config.

    Default: None

  • multiply_lr (float, optional) – Multiply the learning rate by factor. Applied for scheduler aswell.

    Default: 1.0

Return type:

Tuple[Optimizer, Optional[_LRScheduler]]

Returns:

The optimizer with its optional scheduler.

eztorch.optimizers.optimizer_factory_two_groups(name, initial_lr1, initial_lr2, model1, model2, batch_size=None, num_steps_per_epoch=None, exclude_wd_norm=False, exclude_wd_bias=False, scaler=None, params={}, scheduler=None)[source]

Optimizer factory to build an optimizer for two groups of parameters and optionally an attached scheduler.

Parameters:
  • name (str) – Name of the scheduler to retrieve the optimizer constructor from _OPTIMIZERS dict.

  • initial_lr1 (float) – Initial learning rate for model 1.

  • initial_lr2 (float) – Initial learning rate for model 2.

  • model1 (Module) – Model 1 to optimize.

  • model2 (Module) – Model 2 to optimize.

  • batch_size (Optional[int], optional) – Batch size for the input of the model.

    Default: None

  • num_steps_per_epoch (Optional[int], optional) – Number of steps per epoch. Useful for some schedulers.

    Default: None

  • exclude_wd_norm (bool, optional) – If True, exclude normalization layers to be regularized by weight decay.

    Default: False

  • exclude_wd_bias (bool, optional) – If True, exclude bias layers to be regularized by weight decay.

    Default: False

  • scaler (Optional[str], optional) – Scaler rule for the initial learning rate.

    Default: None

  • params (DictConfig, optional) – Parameters for the optimizer constructor.

    Default: {}

  • scheduler (Optional[DictConfig], optional) – Scheduler config for model.

    Default: None

Return type:

Tuple[Optimizer, Optional[_LRScheduler]]

Returns:

The optimizer with its optional scheduler.

Custom Optimizers

LARS

class eztorch.optimizers.LARS(params, lr=0, weight_decay=0, momentum=0.9, trust_coefficient=0.001)[source]

LARS optimizer, no rate scaling or weight decay for parameters <= 1D.

References LARS:
Parameters:
  • params (Iterable[Parameter]) – Parameters to optimize.

  • lr (float, optional) – Learning rate of the optimizer.

    Default: 0

  • weight_decay (float, optional) – Weight decay to apply.

    Default: 0

  • momentum (float, optional) – Momentum for optimization.

    Default: 0.9

  • trust_coefficient (float, optional) – LARS trust coefficient.

    Default: 0.001