VGG(使用重复元素的网络)
大约 3 分钟
VGG(使用重复元素的网络)
What is the VGG?
VGG 块一般规律是连续几个填充为 1,卷积层尺寸为 3x3 的卷积层,之后接上一个步幅为 2,形状为 2x2 的最大池化层。我们可以定义一个函数,来定义 VGG 块,该 VGG 块又可以运用到其他模型的定义中。
本节学习要点
- 学习如何用函数构建模块
- 学习 VGG 块的使用
导入需要的包
import sys
import time
import torch
import torchvision
from torch import nn
from d2lzh_pytorch import d2lzh_pytorch as d2l
sys.path.append(".")
定义 VGG 块函数
def VGG_block(num_convs, in_channels, out_channels):
blk = []
for i in range(num_convs):
blk.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
blk.append(nn.ReLU())
blk.append(nn.MaxPool2d(2, stride=2))
return nn.Sequential(*blk)
先构建列表blk
,将各个卷积层列出,最后在nn.Sequential()
中用*
解耦,返回一个nn.Sequential
实例对象。卷积层保持输入矩阵的高和宽不变,而池化层则对其减半。
VGG 网络
VGG 网络由连续数个 VGG 块组成,每个块的卷积层数、输入通道数、输出通道数都不同,最后加上全连接层模块,定义 VGG 网络需要各 VGG 块的参数、全连接层结构、全连接层各层的dropout
参数。
各VGG块参数
各个 VGG 块的定义由tuple
数据结构存储,命名为conv_arch
,每个tuple
内是由几个 3x1 的tuple
串联组成,组成结构为:(num_convs,in_channels,out_channels)
(该子tuple
定义了各个 VGG 块的参数)全连接层结构
我们还需要一个tuple
定义全连接层结构,命名为:fc_arch
,组成结构为:(num_features, num_hidden_1, num_hidden_2, ... ,num_output)
全连接层各层 dropout 参数
使用tuple
定义各个全连接层的dropout
参数,命名为:dropout_list
,由于输出层不需要 dropout,故dropout_list
长度应该比fc_arch
少 1,组成结构为(dropout_1, dropout_2,...)
依据这三个参数,我们可以定义创建 VGG 网络的函数VGG_net
。
def VGG_net(conv_arch: tuple, fc_arch: tuple, dropout_list: tuple):
net = nn.Sequential()
fc_len = len(fc_arch)
fc_hidden_num = fc_len - 2 # the number of the hidden layer
num_feature = fc_arch[0]
num_outputs = fc_arch[-1]
for i, (num_convs, in_channels, out_channels) in enumerate(conv_arch):
net.add_module(
"vgg_block_" + str(i + 1), VGG_block(num_convs, in_channels, out_channels)
)
input_layer = nn.ModuleDict(
{
"fc_1": nn.Linear(num_feature, fc_arch[1]),
"fc_1_relu": nn.ReLU(),
"fc_1_dropout": nn.Dropout(dropout_list[0]),
}
)
net.add_module("input_layer", input_layer)
for j in range(fc_hidden_num):
hidden_layer = nn.ModuleDict({
"fc_" + str(j + 2): nn.Linear(fc_arch[j + 1], fc_arch[j + 2]),
"fc_" + str(j + 2) + "_relu": nn.ReLU(),
"fc_" + str(j + 2) + "_dropout": nn.Dropout(dropout_list[j + 1]),
})
net.add_module("hidden_layer_" + str(j + 1), hidden_layer)
net.add_module("output_layer", nn.Linear(fc_arch[-2], num_outputs))
return net
生成 VGG 模型
conv_arch = ((1, 1, 64), (1, 64, 128), (2, 128, 256), (2, 256, 512), (2, 512, 512))
fc_arch = (64 * 5 * 5, 4096, 4096, 10)
dropout_list = (0.6, 0.8,0.9)
print(VGG_net(conv_arch, fc_arch, dropout_list))
输出结果:
Sequential(
(vgg_block_1): Sequential(
(0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(vgg_block_2): Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(vgg_block_3): Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(vgg_block_4): Sequential(
(0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(vgg_block_5): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(input_layer): ModuleDict(
(fc_1): Linear(in_features=1600, out_features=4096, bias=True)
(fc_1_relu): ReLU()
(fc_1_dropout): Dropout(p=0.6, inplace=False)
)
(hidden_layer_1): ModuleDict(
(fc_2): Linear(in_features=4096, out_features=4096, bias=True)
(fc_2_relu): ReLU()
(fc_2_dropout): Dropout(p=0.8, inplace=False)
)
(hidden_layer_2): ModuleDict(
(fc_3): Linear(in_features=4096, out_features=10, bias=True)
(fc_3_relu): ReLU()
(fc_3_dropout): Dropout(p=0.9, inplace=False)
)
(output_layer): Linear(in_features=4096, out_features=10, bias=True)
)