Trunks¶
Image¶
ResNet and ResNext¶
- eztorch.models.trunks.create_resnet(name, num_classes=1000, progress=True, pretrained=False, small_input=False, **kwargs)[source]¶
Build ResNet from torchvision for image.
- Parameters:
name (
str) – name of the resnet model (such as resnet18).num_classes (
int, optional) – If not \(0\), replace the last fully connected layer with num_classes output, if \(0\) replace by identity.Default:1000pretrained (
bool, optional) – IfTrue, returns a model pre-trained on ImageNet.Default:Falseprogress (
bool, optional) – IfTrue, displays a progress bar of the download to stderr.Default:Truesmall_input (
bool, optional) – IfTrue, replace the first conv2d for small images and replace first maxpool by identity.Default:False**kwargs – arguments specific to torchvision constructors for ResNet.
- Return type:
Module- Returns:
Basic resnet.
Timm¶
Timm models are accessible through Eztorch to retrieve VITs, Efficient-Net, …
- eztorch.models.trunks.create_model_timm(model_name, pretrained=False, pretrained_cfg=None, checkpoint_path='', scriptable=None, exportable=None, no_jit=None, **kwargs)¶
Create a model
- Parameters:
model_name (str) – name of model to instantiate
pretrained (bool) – load pretrained ImageNet-1k weights if true
Default:Falsecheckpoint_path (str) – path of checkpoint to load after model is initialized
Default:''scriptable (bool) – set layer config so that model is jit scriptable (not working for all models yet)
Default:Noneexportable (bool) – set layer config so that model is traceable / ONNX exportable (not fully impl/obeyed yet)
Default:Noneno_jit (bool) – set layer config so that model doesn’t utilize jit scripted layers (so far activations only)
Default:None- Keyword Arguments:
drop_rate (float) – dropout rate for training (default: 0.0)
global_pool (str) – global pool type (default: ‘avg’)
** – other kwargs are model specific
Video¶
Pytorchvideo¶
Pytorchvideo models are accessible if the library has been installed and it is possible to use them to retrieve their models.
Video model and Head wrapper¶
ResNet 3D with basic blocks¶
- eztorch.models.trunks.create_resnet3d_basic(*, input_channel=3, model_depth=50, model_num_class=400, dropout_rate=0.5, norm=<class 'torch.nn.modules.batchnorm.BatchNorm3d'>, activation=<class 'torch.nn.modules.activation.ReLU'>, stem_activation=<class 'torch.nn.modules.activation.ReLU'>, stem_dim_out=64, stem_conv_kernel_size=(1, 7, 7), stem_conv_stride=(1, 2, 2), stem_pool=<class 'torch.nn.modules.pooling.MaxPool3d'>, stem_pool_kernel_size=(1, 3, 3), stem_pool_stride=(1, 2, 2), stem=<function create_res_basic_stem>, stage1_pool=None, stage1_pool_kernel_size=(2, 1, 1), stage_conv_a_kernel_size=((1, 3, 3), (1, 3, 3), (3, 3, 3), (3, 3, 3)), stage_conv_b_kernel_size=((1, 3, 3), (1, 3, 3), (1, 3, 3), (1, 3, 3)), stage_spatial_h_stride=(1, 2, 2, 2), stage_spatial_w_stride=(1, 2, 2, 2), stage_temporal_stride=(1, 1, 1, 1), basicblock=<function create_basic_block>, head=<function create_res_basic_head>, head_pool=<class 'torch.nn.modules.pooling.AvgPool3d'>, head_pool_kernel_size=(4, 7, 7), head_output_size=(1, 1, 1), head_activation=None, head_output_with_global_average=True)[source]¶
Build ResNet style models for video recognition. ResNet has three parts: Stem, Stages and Head. Stem is the first Convolution layer (Conv1) with an optional pooling layer. Stages are grouped residual blocks. There are usually multiple stages and each stage may include multiple residual blocks. Head may include pooling, dropout, a fully-connected layer and global spatial temporal averaging. The three parts are assembled in the following order:
Input ↓ Stem ↓ Stage 1 ↓ . . . ↓ Stage N ↓ Head
- Parameters:
input_channel (
int, optional) – Number of channels for the input video clip.Default:3model_depth (
int, optional) – The depth of the resnet. Options include: \(18, 50, 101, 152\).Default:50model_num_class (
int, optional) – The number of classes for the video dataset.Default:400dropout_rate (
float, optional) – Dropout rate.Default:0.5norm (
Callable, optional) – A callable that constructs normalization layer.Default:<class 'torch.nn.modules.batchnorm.BatchNorm3d'>activation (
Callable, optional) – A callable that constructs activation layer.Default:<class 'torch.nn.modules.activation.ReLU'>stem_activation (
Optional[Callable], optional) – A callable that constructs activation layer of stem.Default:<class 'torch.nn.modules.activation.ReLU'>stem_dim_out (
int, optional) – Output channel size to stem.Default:64stem_conv_kernel_size (
Tuple[int], optional) – Convolutional kernel size(s) of stem.Default:(1, 7, 7)stem_conv_stride (
Tuple[int], optional) – Convolutional stride size(s) of stem.Default:(1, 2, 2)stem_pool (
Optional[Callable], optional) – A callable that constructs resnet head pooling layer.Default:<class 'torch.nn.modules.pooling.MaxPool3d'>stem_pool_kernel_size (
Tuple[int], optional) – Pooling kernel size(s).Default:(1, 3, 3)stem_pool_stride (
Tuple[int], optional) – Pooling stride size(s).Default:(1, 2, 2)stem (
Optional[Callable], optional) – A callable that constructs stem layer. Examples include:create_res_video_stem().Default:<function create_res_basic_stem>stage_conv_a_kernel_size (
Union[Tuple[int],Tuple[Tuple[int]]], optional) – Convolutional kernel size(s) forconv_a.Default:((1, 3, 3), (1, 3, 3), (3, 3, 3), (3, 3, 3))stage_conv_b_kernel_size (
Union[Tuple[int],Tuple[Tuple[int]]], optional) – Convolutional kernel size(s) forconv_b.Default:((1, 3, 3), (1, 3, 3), (1, 3, 3), (1, 3, 3))stage_spatial_h_stride (
Tuple[int], optional) – The spatial height stride for each stage.Default:(1, 2, 2, 2)stage_spatial_w_stride (
Tuple[int], optional) – The spatial width stride for each stage.Default:(1, 2, 2, 2)stage_temporal_stride (
Tuple[int], optional) – The temporal stride for each stage.Default:(1, 1, 1, 1)basicblock (
Union[Tuple[Callable],Callable], optional) – A callable that constructs basicblock block layer. Examples include:create_basicblock_block().Default:<function create_basic_block>head (
Callable, optional) – A callable that constructs the resnet-style head. Ex: create_res_basic_headDefault:<function create_res_basic_head>head_pool (
Callable, optional) – A callable that constructs resnet head pooling layer.Default:<class 'torch.nn.modules.pooling.AvgPool3d'>head_pool_kernel_size (
Tuple[int], optional) – The pooling kernel size.Default:(4, 7, 7)head_output_size (
Tuple[int], optional) – The size of output tensor for head.Default:(1, 1, 1)head_activation (
Callable, optional) – A callable that constructs activation layer.Default:Nonehead_output_with_global_average (
bool, optional) – ifTrue, perform global averaging on the head output.Default:True- Return type:
Module- Returns:
Basic resnet.
R2+1D¶
General R2+1D
- eztorch.models.trunks.create_r2plus1d(*, input_channel=3, model_depth=50, model_num_class=400, dropout_rate=0.0, norm=<class 'torch.nn.modules.batchnorm.BatchNorm3d'>, norm_eps=1e-05, norm_momentum=0.1, activation=<class 'torch.nn.modules.activation.ReLU'>, stem_dim_out=64, stem_conv_kernel_size=(1, 7, 7), stem_conv_stride=(1, 2, 2), stage_conv_a_kernel_size=((1, 1, 1), (1, 1, 1), (1, 1, 1), (1, 1, 1)), stage_conv_b_kernel_size=((3, 3, 3), (3, 3, 3), (3, 3, 3), (3, 3, 3)), stage_conv_b_num_groups=(1, 1, 1, 1), stage_conv_b_dilation=((1, 1, 1), (1, 1, 1), (1, 1, 1), (1, 1, 1)), stage_spatial_stride=(2, 2, 2, 2), stage_temporal_stride=(1, 1, 2, 2), stage_bottleneck=(<function create_2plus1d_bottleneck_block>, <function create_2plus1d_bottleneck_block>, <function create_2plus1d_bottleneck_block>, <function create_2plus1d_bottleneck_block>), head=<function create_res_basic_head>, head_pool=<class 'torch.nn.modules.pooling.AvgPool3d'>, head_pool_kernel_size=(4, 7, 7), head_output_size=(1, 1, 1), head_activation=<class 'torch.nn.modules.activation.Softmax'>, head_output_with_global_average=True)[source]¶
Build the R(2+1)D network from:: A closer look at spatiotemporal convolutions for action recognition. Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, Manohar Paluri. CVPR 2018.
R(2+1)D follows the ResNet style architecture including three parts: Stem, Stages and Head. The three parts are assembled in the following order:
Input ↓ Stem ↓ Stage 1 ↓ . . . ↓ Stage N ↓ Head
- Parameters:
input_channel (
int, optional) – Number of channels for the input video clip.Default:3model_depth (
int, optional) – The depth of the resnet.Default:50model_num_class (
int, optional) – The number of classes for the video dataset.Default:400dropout_rate (
float, optional) – Dropout rate.Default:0.0norm (
Callable, optional) – A callable that constructs normalization layer.Default:<class 'torch.nn.modules.batchnorm.BatchNorm3d'>norm_eps (
float, optional) – Normalization epsilon.Default:1e-05norm_momentum (
float, optional) – Normalization momentum.Default:0.1activation (
Callable, optional) – A callable that constructs activation layer.Default:<class 'torch.nn.modules.activation.ReLU'>stem_dim_out (
int, optional) – Output channel size for stem.Default:64stem_conv_kernel_size (
Tuple[int], optional) – Convolutional kernel size(s) of stem.Default:(1, 7, 7)stem_conv_stride (
Tuple[int], optional) – Convolutional stride size(s) of stem.Default:(1, 2, 2)stage_conv_a_kernel_size (
Tuple[Tuple[int]], optional) – Convolutional kernel size(s) for conv_a.Default:((1, 1, 1), (1, 1, 1), (1, 1, 1), (1, 1, 1))stage_conv_b_kernel_size (
Tuple[Tuple[int]], optional) – Convolutional kernel size(s) for conv_b.Default:((3, 3, 3), (3, 3, 3), (3, 3, 3), (3, 3, 3))stage_conv_b_num_groups (
Tuple[int], optional) – Number of groups for groupwise convolution for conv_b. 1 for ResNet, and larger than 1 for ResNeXt.Default:(1, 1, 1, 1)stage_conv_b_dilation (
Tuple[Tuple[int]], optional) – Dilation for 3D convolution for conv_b.Default:((1, 1, 1), (1, 1, 1), (1, 1, 1), (1, 1, 1))stage_spatial_stride (
Tuple[int], optional) – The spatial stride for each stage.Default:(2, 2, 2, 2)stage_temporal_stride (
Tuple[int], optional) – The temporal stride for each stage.Default:(1, 1, 2, 2)stage_bottleneck (
Tuple[Callable], optional) – A callable that constructs bottleneck block layer for each stage. Examples include:create_bottleneck_block(),create_2plus1d_bottleneck_block().Default:(<function create_2plus1d_bottleneck_block>, <function create_2plus1d_bottleneck_block>, <function create_2plus1d_bottleneck_block>, <function create_2plus1d_bottleneck_block>)head_pool (
Callable, optional) – A callable that constructs resnet head pooling layer.Default:<class 'torch.nn.modules.pooling.AvgPool3d'>head_pool_kernel_size (
Tuple[int], optional) – The pooling kernel size.Default:(4, 7, 7)head_output_size (
Tuple[int], optional) – The size of output tensor for head.Default:(1, 1, 1)head_activation (
Callable, optional) – A callable that constructs activation layer.Default:<class 'torch.nn.modules.activation.Softmax'>head_output_with_global_average (
bool, optional) – IfTrue, perform global averaging on the head output.Default:True- Return type:
Module- Returns:
Basic resnet.
R2+1D18 often used in papers
- eztorch.models.trunks.create_r2plus1d_18(downsample=True, num_classes=101, layers=[1, 1, 1, 1], progress=True, pretrained=False, stem=<class 'eztorch.models.trunks.r2plus1d_18.LargeR2Plus1dStem'>, **kwargs)[source]¶
Build R2+1D_18 from torchvision for video.
- Parameters:
num_classes (
int, optional) – If not \(0\), replace the last fully connected layer withnum_classesoutput, if \(0\) replace by identity.Default:101pretrained (
bool, optional) – IfTrue, returns a model pre-trained on ImageNet.Default:Falseprogress (
bool, optional) – IfTrue, displays a progress bar of the download to stderrDefault:Truelayers (
List[int], optional) – Number of layers per block.Default:[1, 1, 1, 1]stem (
Union[str,Module], optional) – Stem to use for input.Default:<class 'eztorch.models.trunks.r2plus1d_18.LargeR2Plus1dStem'>**kwargs – arguments specific to torchvision constructors for ResNet.
- Return type:
Module- Returns:
Basic resnet.
S3D¶
- eztorch.models.trunks.create_s3d(num_classes=101, gating=False, slow=False)[source]¶
Build s3d network.
- Parameters:
num_classes (
int, optional) – If not \(0\), replace the last fully connected layer with num_classes output, if \(0\) replace by identity. Defaults to \(101\).Default:101gating (
bool, optional) – IfTrue, init S3D-G network.Default:Falseslow (
bool, optional) – IfTrue, use slow S3D.Default:False- Return type:
Module- Returns:
The S3D network instantiated.
X3D¶
- eztorch.models.trunks.create_x3d(*, input_channel=3, input_clip_length=13, input_crop_size=160, model_num_class=400, dropout_rate=0.5, width_factor=2.0, depth_factor=2.2, norm=<class 'torch.nn.modules.batchnorm.BatchNorm3d'>, norm_eps=1e-05, norm_momentum=0.1, activation=<class 'torch.nn.modules.activation.ReLU'>, stem_dim_in=12, stem_conv_kernel_size=(5, 3, 3), stem_conv_stride=(1, 2, 2), stage_conv_kernel_size=((3, 3, 3), (3, 3, 3), (3, 3, 3), (3, 3, 3)), stage_spatial_stride=(2, 2, 2, 2), stage_temporal_stride=(1, 1, 1, 1), bottleneck=<function create_x3d_bottleneck_block>, bottleneck_factor=2.25, se_ratio=0.0625, inner_act=<class 'pytorchvideo.layers.swish.Swish'>, head=<function create_x3d_head>, head_dim_out=2048, head_pool_act=<class 'torch.nn.modules.activation.ReLU'>, head_bn_lin5_on=False, head_activation=<class 'torch.nn.modules.activation.Softmax'>, head_output_with_global_average=True)[source]¶
X3D model builder. It builds a X3D network backbone, which is a ResNet.
Christoph Feichtenhofer. “X3D: Expanding Architectures for Efficient Video Recognition.” https://arxiv.org/abs/2004.04730
Input ↓ Stem ↓ Stage 1 ↓ . . . ↓ Stage N ↓ Head
- Parameters:
input_channel (
int, optional) – Number of channels for the input video clip.Default:3input_clip_length (
int, optional) – Length of the input video clip. Value for different models: X3D-XS: 4; X3D-S: 13; X3D-M: 16; X3D-L: 16.Default:13input_crop_size (
int, optional) – Spatial resolution of the input video clip. Value for different models: X3D-XS: 160; X3D-S: 160; X3D-M: 224; X3D-L: 312.Default:160model_num_class (
int, optional) – The number of classes for the video dataset.Default:400dropout_rate (
float, optional) – Dropout rate.Default:0.5width_factor (
float, optional) – Width expansion factor.Default:2.0depth_factor (
float, optional) – Depth expansion factor. Value for different models: X3D-XS: 2.2; X3D-S: 2.2; X3D-M: 2.2; X3D-L: 5.0.Default:2.2norm (
Callable, optional) – A callable that constructs normalization layer.Default:<class 'torch.nn.modules.batchnorm.BatchNorm3d'>norm_eps (
float, optional) – Normalization epsilon.Default:1e-05norm_momentum (
float, optional) – Normalization momentum.Default:0.1activation (
Callable, optional) – A callable that constructs activation layer.Default:<class 'torch.nn.modules.activation.ReLU'>stem_dim_in (
int, optional) – Input channel size for stem before expansion.Default:12stem_conv_kernel_size (
Tuple[int], optional) – Convolutional kernel size(s) of stem.Default:(5, 3, 3)stem_conv_stride (
Tuple[int], optional) – Convolutional stride size(s) of stem.Default:(1, 2, 2)stage_conv_kernel_size (
Tuple[Tuple[int]], optional) – Convolutional kernel size(s) forconv_b.Default:((3, 3, 3), (3, 3, 3), (3, 3, 3), (3, 3, 3))stage_spatial_stride (
Tuple[int], optional) – The spatial stride for each stage.Default:(2, 2, 2, 2)stage_temporal_stride (
Tuple[int], optional) – The temporal stride for each stage.Default:(1, 1, 1, 1)bottleneck_factor (
float, optional) – Bottleneck expansion factor for the 3x3x3 conv.Default:2.25se_ratio (
float, optional) – if > 0, apply SE to the 3x3x3 conv, with the SE channel dimensionality being se_ratio times the 3x3x3 conv dim.Default:0.0625inner_act (
Callable, optional) – Whether use Swish activation foract_bor not.Default:<class 'pytorchvideo.layers.swish.Swish'>head_dim_out (
int, optional) – Output channel size of the X3D head.Default:2048head_pool_act (
Callable, optional) – A callable that constructs resnet pool activation layer such asReLU.Default:<class 'torch.nn.modules.activation.ReLU'>head_bn_lin5_on (
bool, optional) – IfTrue, perform normalization on the features before the classifier.Default:Falsehead_activation (
Callable, optional) – A callable that constructs activation layer.Default:<class 'torch.nn.modules.activation.Softmax'>head_output_with_global_average (
bool, optional) – IfTrue, perform global averaging on the head output.Default:True- Return type:
Module- Returns:
The X3D network.