正则化"/>
权重衰减与L2范数正则化
正则化就是在损失函数后加上一个自定义的正则项,假设我们用线性模型的损失函数均方差为例
l o s s = 1 n ( ∑ 1 n ( w 1 x 1 + w 2 x 2 + b − y ^ ) 2 + λ ∣ ∣ w ∣ ∣ 2 ) loss = \frac 1 n\big(\sum_1^n(w_1x_1+w_2x_2+b-\hat y)^2 + \lambda ||w||^2 \big) loss=n1(∑1n(w1x1+w2x2+b−y^)2+λ∣∣w∣∣2)
∣ ∣ w ∣ ∣ 2 = 1 2 ( w 1 2 + w 2 2 ) ||w||^2 = \frac 1 2 (w_1^2 + w_2^2) ∣∣w∣∣2=21(w12+w22)
这里的 λ ∣ ∣ w ∣ ∣ 2 \lambda ||w||^2 λ∣∣w∣∣2就是L2范数正则项,将它加入到损失函数中,,梯度下降时模型会倾向于选择参数较小的模型,这样使得一个较为复杂的model的function set变小,即model的弹性变小,在一定程度上可以缓解过拟合。
L2范数正则化其实就是权重衰减,为什么这么说呢?
加上L2范数正则项的loss对各参数进行求导
∂ l o s s w 1 = 2 n ∑ 1 n ( w 1 x 1 + w 2 x 2 + b − y ^ ) ∗ x 1 + λ w 1 \frac {\partial loss} {w_1} = \frac 2 n\sum_1^n(w_1x_1+w_2x_2+b-\hat y) * x_1 + \lambda w_1 w1∂loss=n2∑1n(w1x1+w2x2+b−y^)∗x1+λw1
w 1 ∗ = ( 1 − λ ) w 1 − g r a d w_1^* = (1 - \lambda)w_1 - grad w1∗=(1−λ)w1−grad
每次参数都会先乘上一个小于1的系数,然后再减去原来的梯度,所以叫权重衰减
下面进行一个高维线性实验
假设我们的真实方程是:
假设feature数200,训练样本和测试样本各20个
模拟数据集
num_train,num_test = 10,10
num_features = 200
true_w = torch.ones((num_features,1),dtype=torch.float32) * 0.01
true_b = torch.tensor(0.5)
samples = torch.normal(0,1,(num_train+num_test,num_features))
noise = torch.normal(0,0.01,(num_train+num_test,1))
labels = samples.matmul(true_w) + true_b + noise
train_samples, train_labels= samples[:num_train],labels[:num_train]
test_samples, test_labels = samples[num_train:],labels[num_train:]
定义带正则项的loss function
def loss_function(predict,label,w,lambd):loss = (predict - label) ** 2loss = loss.mean() + lambd * (w**2).mean()return loss
画图的方法
def semilogy(x_val,y_val,x_label,y_label,x2_val,y2_val,legend):plt.figure(figsize=(3,3))plt.xlabel(x_label)plt.ylabel(y_label)plt.semilogy(x_val,y_val)if x2_val and y2_val:plt.semilogy(x2_val,y2_val)plt.legend(legend)plt.show()
拟合和画图
def fit_and_plot(train_samples,train_labels,test_samples,test_labels,num_epoch,lambd):w = torch.normal(0,1,(train_samples.shape[-1],1),requires_grad=True)b = torch.tensor(0.,requires_grad=True)optimizer = torch.optim.Adam([w,b],lr=0.05)train_loss = []test_loss = []for epoch in range(num_epoch):predict = train_samples.matmul(w) + bepoch_train_loss = loss_function(predict,train_labels,w,lambd)optimizer.zero_grad()epoch_train_loss.backward()optimizer.step()test_predict = test_sapmles.matmul(w) + bepoch_test_loss = loss_function(test_predict,test_labels,w,lambd)train_loss.append(epoch_train_loss.item())test_loss.append(epoch_test_loss.item())semilogy(range(1,num_epoch+1),train_loss,'epoch','loss',range(1,num_epoch+1),test_loss,['train','test'])
可以发现加了正则项的模型,在测试集上的loss确实下降了
更多推荐
权重衰减与L2范数正则化
发布评论