Optimizers¶

Factories¶

eztorch.optimizers.optimizer_factory(name, initial_lr, model, batch_size=None, num_steps_per_epoch=None, layer_decay_lr=None, keys_without_decay=[], exclude_wd_norm=False, exclude_wd_bias=False, scaler=None, params={}, divide_wd_by_lr=False, scheduler=None, multiply_lr=1.0, multiply_parameters=[])[source]¶

Optimizer factory to build optimizers and optionally an attached scheduler.

Parameters:

name (str) – Name of the scheduler to retrieve the optimizer constructor from _OPTIMIZERS dict.
initial_lr (float) – Initial learning rate.
model (Module) – Model to optimize.
batch_size (Optional[int], optional) – Batch size for the input of the model.
Default: None
num_steps_per_epoch (Optional[int], optional) – Number of steps per epoch. Useful for some schedulers.
Default: None
keys_without_decay (List[str], optional) – Keys to filter parameters for weight decay.
Default: []
exclude_wd_norm (bool, optional) – If True, exclude normalization layers to be regularized by weight decay.
Default: False
exclude_wd_bias (bool, optional) – If True, exclude bias layers to be regularized by weight decay.
Default: False
scaler (Optional[str], optional) – Scaler rule for the initial learning rate.
Default: None
params (DictConfig, optional) – Parameters for the optimizer constructor.
Default: {}
divide_wd_by_lr (bool, optional) – If True, divide the weight decay by the value of the learning rate.
Default: False
scheduler (Optional[DictConfig], optional) – Scheduler config.
Default: None
multiply_lr (float, optional) – Multiply the learning rate by factor. Applied for scheduler aswell.
Default: 1.0

Return type:

Tuple[Optimizer, Optional[_LRScheduler]]

Returns:

The optimizer with its optional scheduler.

eztorch.optimizers.optimizer_factory_two_groups(name, initial_lr1, initial_lr2, model1, model2, batch_size=None, num_steps_per_epoch=None, exclude_wd_norm=False, exclude_wd_bias=False, scaler=None, params={}, scheduler=None)[source]¶

Optimizer factory to build an optimizer for two groups of parameters and optionally an attached scheduler.

Parameters:

name (str) – Name of the scheduler to retrieve the optimizer constructor from _OPTIMIZERS dict.
initial_lr1 (float) – Initial learning rate for model 1.
initial_lr2 (float) – Initial learning rate for model 2.
model1 (Module) – Model 1 to optimize.
model2 (Module) – Model 2 to optimize.
batch_size (Optional[int], optional) – Batch size for the input of the model.
Default: None
num_steps_per_epoch (Optional[int], optional) – Number of steps per epoch. Useful for some schedulers.
Default: None
exclude_wd_norm (bool, optional) – If True, exclude normalization layers to be regularized by weight decay.
Default: False
exclude_wd_bias (bool, optional) – If True, exclude bias layers to be regularized by weight decay.
Default: False
scaler (Optional[str], optional) – Scaler rule for the initial learning rate.
Default: None
params (DictConfig, optional) – Parameters for the optimizer constructor.
Default: {}
scheduler (Optional[DictConfig], optional) – Scheduler config for model.
Default: None

Return type:

Tuple[Optimizer, Optional[_LRScheduler]]

Returns:

The optimizer with its optional scheduler.

Custom Optimizers¶

LARS¶

class eztorch.optimizers.LARS(params, lr=0, weight_decay=0, momentum=0.9, trust_coefficient=0.001)[source]¶

LARS optimizer, no rate scaling or weight decay for parameters <= 1D.

References LARS:

https://arxiv.org/pdf/1708.03888.pdf

Parameters:

params (Iterable[Parameter]) – Parameters to optimize.
lr (float, optional) – Learning rate of the optimizer.
Default: 0
weight_decay (float, optional) – Weight decay to apply.
Default: 0
momentum (float, optional) – Momentum for optimization.
Default: 0.9
trust_coefficient (float, optional) – LARS trust coefficient.
Default: 0.001