Optimizers¶
Factories¶
- eztorch.optimizers.optimizer_factory(name, initial_lr, model, batch_size=None, num_steps_per_epoch=None, layer_decay_lr=None, keys_without_decay=[], exclude_wd_norm=False, exclude_wd_bias=False, scaler=None, params={}, divide_wd_by_lr=False, scheduler=None, multiply_lr=1.0, multiply_parameters=[])[source]¶
Optimizer factory to build optimizers and optionally an attached scheduler.
- Parameters:
name (
str
) – Name of the scheduler to retrieve the optimizer constructor from_OPTIMIZERS
dict.initial_lr (
float
) – Initial learning rate.model (
Module
) – Model to optimize.batch_size (
Optional
[int
], optional) – Batch size for the input of the model.Default:None
num_steps_per_epoch (
Optional
[int
], optional) – Number of steps per epoch. Useful for some schedulers.Default:None
keys_without_decay (
List
[str
], optional) – Keys to filter parameters for weight decay.Default:[]
exclude_wd_norm (
bool
, optional) – IfTrue
, exclude normalization layers to be regularized by weight decay.Default:False
exclude_wd_bias (
bool
, optional) – IfTrue
, exclude bias layers to be regularized by weight decay.Default:False
scaler (
Optional
[str
], optional) – Scaler rule for the initial learning rate.Default:None
params (
DictConfig
, optional) – Parameters for the optimizer constructor.Default:{}
divide_wd_by_lr (
bool
, optional) – IfTrue
, divide the weight decay by the value of the learning rate.Default:False
scheduler (
Optional
[DictConfig
], optional) – Scheduler config.Default:None
multiply_lr (
float
, optional) – Multiply the learning rate by factor. Applied for scheduler aswell.Default:1.0
- Return type:
Tuple
[Optimizer
,Optional
[_LRScheduler
]]- Returns:
The optimizer with its optional scheduler.
- eztorch.optimizers.optimizer_factory_two_groups(name, initial_lr1, initial_lr2, model1, model2, batch_size=None, num_steps_per_epoch=None, exclude_wd_norm=False, exclude_wd_bias=False, scaler=None, params={}, scheduler=None)[source]¶
Optimizer factory to build an optimizer for two groups of parameters and optionally an attached scheduler.
- Parameters:
name (
str
) – Name of the scheduler to retrieve the optimizer constructor from_OPTIMIZERS
dict.initial_lr1 (
float
) – Initial learning rate for model 1.initial_lr2 (
float
) – Initial learning rate for model 2.model1 (
Module
) – Model 1 to optimize.model2 (
Module
) – Model 2 to optimize.batch_size (
Optional
[int
], optional) – Batch size for the input of the model.Default:None
num_steps_per_epoch (
Optional
[int
], optional) – Number of steps per epoch. Useful for some schedulers.Default:None
exclude_wd_norm (
bool
, optional) – IfTrue
, exclude normalization layers to be regularized by weight decay.Default:False
exclude_wd_bias (
bool
, optional) – IfTrue
, exclude bias layers to be regularized by weight decay.Default:False
scaler (
Optional
[str
], optional) – Scaler rule for the initial learning rate.Default:None
params (
DictConfig
, optional) – Parameters for the optimizer constructor.Default:{}
scheduler (
Optional
[DictConfig
], optional) – Scheduler config for model.Default:None
- Return type:
Tuple
[Optimizer
,Optional
[_LRScheduler
]]- Returns:
The optimizer with its optional scheduler.
Custom Optimizers¶
LARS¶
- class eztorch.optimizers.LARS(params, lr=0, weight_decay=0, momentum=0.9, trust_coefficient=0.001)[source]¶
LARS optimizer, no rate scaling or weight decay for parameters <= 1D.
- References LARS:
- Parameters:
params (
Iterable
[Parameter
]) – Parameters to optimize.lr (
float
, optional) – Learning rate of the optimizer.Default:0
weight_decay (
float
, optional) – Weight decay to apply.Default:0
momentum (
float
, optional) – Momentum for optimization.Default:0.9
trust_coefficient (
float
, optional) – LARS trust coefficient.Default:0.001