yolov1的损失函数详解

编程入门行业动态更新时间:2024-10-11 21:20:05

yolov1的损失函数<a href=https://www.elefans.com/category/jswz/34/1770044.html style= 详解"/>

yolov1的损失函数详解

一、整体框架损失函数如下

二.损失函数具体解释

首先明确一概念，网格就是最终特征图（7*7*30）上的一个预测目标了，v1版本这样的预测结果网格一共有49个，每个维度为1*30.这30的向量含义为【x0，y0，w0, h0，I0，x1，y1，w1, h1，I1，C0,C1,C2.....C19】前面10位代表了2个box框信息以及其置信度，后20位表示了分类概率值。后面的损失函数都是针对网格中的一些属性来干活的。

2.1 标注有预测有损失

标注图像某位置有目标，预测为有==>计算response loss 响应损失以及box框的坐标等的信息对应蓝色框和红色框

coo_response_mask = torch.cuda.ByteTensor(box_target.size())
coo_response_mask.zero_()
coo_not_response_mask = torch.cuda.ByteTensor(box_target.size())
coo_not_response_mask.zero_()
for i in range(0,box_target.size()[0],2):box1 = box_pred[i:i+2]box1_xyxy = Variable(torch.FloatTensor(box1.size()))box1_xyxy[:,:2] = box1[:,:2] -0.5*box1[:,2:4]box1_xyxy[:,2:4] = box1[:,:2] +0.5*box1[:,2:4]box2 = box_target[i].view(-1,5)box2_xyxy = Variable(torch.FloatTensor(box2.size()))box2_xyxy[:,:2] = box2[:,:2] -0.5*box2[:,2:4]box2_xyxy[:,2:4] = box2[:,:2] +0.5*box2[:,2:4]iou = selfpute_iou(box1_xyxy[:,:4],box2_xyxy[:,:4]) #[2,1]max_iou,max_index = iou.max(0)max_index = max_index.data.cuda()coo_response_mask[i+max_index]=1coo_not_response_mask[i+1-max_index]=1
box_pred_response = box_pred[coo_response_mask].view(-1,5)
box_target_response = box_target[coo_response_mask].view(-1,5)
contain_loss = F.mse_loss(box_pred_response[:,4],box_target_response[:,4],size_average=False)   
loc_loss = F.mse_loss(box_pred_response[:,:2],box_target_response[:,:2],size_average=False) + F.mse_loss(torch.sqrt(box_pred_response[:,2:4]),torch.sqrt(box_target_response[:,2:4]),size_average=False)

contain_loss是计算预测为有目标的网格的confidence与真值的confidence的平方误差作为loss判定。只是两个值的计算对应红色框。(其实就是判断置信度的误差损失）

loc_loss是计算蓝色框的内容呢。

2.2 标注有预测无损失

标注图像某位置有目标，预测为无==>计算not response loss未响应损失，对应代码如下。

box_pred_not_response = box_pred[coo_not_response_mask].view(-1,5)
box_target_not_response = box_target[coo_not_response_mask].view(-1,5)
not_contain_loss = F.mse_loss(box_pred_response[:,4],box_target_response[:,4],size_average=False)

not_contain_loss是计算预测为无的网格confidence与真值的confidence的平方误差作为loss判定，也只是两个值的计算对应红色框。

可见红色框是分了两部分计算的切记切记

2.3 标注无预测有损失

标注图像某位置无目标，预测为有==>计算不包含obj损失只计算第4,9位的有无物体概率的loss ，对应代码是下面这行。

noo_pred = pred_tensor[noo_mask].view(-1,30)  
noo_target = target_tensor[noo_mask].view(-1,30)     
noo_pred_mask = torch.cuda.ByteTensor(noo_pred.size())  
noo_pred_mask.zero_()   
# 将第4、9  即有物体的confidence置为1
noo_pred_mask[:, 4] = 1
noo_pred_mask[:, 9] = 1
noo_pred_c = noo_pred[noo_pred_mask]       
noo_target_c = noo_target[noo_pred_mask]
nooobj_loss = F.mse_loss(noo_pred_c,noo_target_c,size_average=False)

noo_mask记录的是所有网格在真实图像上目标存在与否的标签。

noo_pred是根据noo_mask标签取出的实际不含目标的预测网格的向量。其向量4,,9位置的值是预测值。

noo_target是根据noo_mask标签取出的实际不含目标的真值网格的向量。其向量第4 ,9位置值是0。

nooobj_loss只计算了这个些网格30维向量的4,9位置的损失值，其他位置都没用。对应上图中的橙色框。这样此部分的loss函数目标就是让预测值越接近0越好。符合了loss的目的了

2.4 标注无预测无

标注图像某位置无目标，预测为无==>无损失(不计算)

2.5 类别计算损失

类别的损失函数计算，代码如下：

class_loss = F.mse_loss(class_pred, class_target, size_average=False)

class_loss计算的是类别的损失函数，是网格向量的后20个数据做最小平方误差来构建loss函数的。对应图中紫色框

至此v1的损失函数整体就完事了呀。

三、完整loss损失函数代码

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variableclass yoloLoss(nn.Module):'''定义一个torch.nn中并未实现的网络层，以使得代码更加模块化torch.nn.Modules相当于是对网络某种层的封装，包括网络结构以及网络参数，和其他有用的操作如输出参数继承Modules类，需实现__init__()方法，以及forward()方法'''def __init__(self,S,B,l_coord,l_noobj):super(yoloLoss,self).__init__()self.S = S    #7代表将图像分为7x7的网格self.B = B    #2代表一个网格预测两个框self.l_coord = l_coord   #5代表 λcoord  更重视8维的坐标预测self.l_noobj = l_noobj   #0.5代表没有object的bbox的confidence lossdef compute_iou(self, box1, box2):'''计算两个框的重叠率IOU通过两组框的联合计算交集，每个框为[x1，y1，x2，y2]。Compute the intersection over union of two set of boxes, each box is [x1,y1,x2,y2].Args:box1: (tensor) bounding boxes, sized [N,4].box2: (tensor) bounding boxes, sized [M,4].Return:(tensor) iou, sized [N,M].'''N = box1.size(0)M = box2.size(0)lt = torch.max(box1[:,:2].unsqueeze(1).expand(N,M,2),  # [N,2] -> [N,1,2] -> [N,M,2]box2[:,:2].unsqueeze(0).expand(N,M,2),  # [M,2] -> [1,M,2] -> [N,M,2])rb = torch.min(box1[:,2:].unsqueeze(1).expand(N,M,2),  # [N,2] -> [N,1,2] -> [N,M,2]box2[:,2:].unsqueeze(0).expand(N,M,2),  # [M,2] -> [1,M,2] -> [N,M,2])wh = rb - lt  # [N,M,2]# wh(wh<0)= 0  # clip at 0wh= (wh < 0).float()inter = wh[:,:,0] * wh[:,:,1]  # [N,M]area1 = (box1[:,2]-box1[:,0]) * (box1[:,3]-box1[:,1])  # [N,]area2 = (box2[:,2]-box2[:,0]) * (box2[:,3]-box2[:,1])  # [M,]area1 = area1.unsqueeze(1).expand_as(inter)  # [N,] -> [N,1] -> [N,M]area2 = area2.unsqueeze(0).expand_as(inter)  # [M,] -> [1,M] -> [N,M]iou = inter / (area1 + area2 - inter)return ioudef forward(self,pred_tensor,target_tensor):'''pred_tensor: (tensor) size(batchsize,S,S,Bx5+20=30) [x,y,w,h,c]target_tensor: (tensor) size(batchsize,S,S,30)Mr.Li个人见解：本来有，预测无--》计算response loss响应损失本来有，预测有--》计算not response loss 未响应损失本来无，预测无--》无损失(不计算)本来无，预测有--》计算不包含obj损失  只计算第4,9位的有无物体概率的loss'''# 1 找出标注值存在的下标coo_mask与不存在的下标noo_mask# N为batchsizeN = pred_tensor.size()[0]# 坐标mask    4：是物体或者背景的confidence    >0   ===========================拿到有物体的记录coo_mask = target_tensor[:,:,:,4] > 0# 没有物体mask                                 ==0  ===========================拿到无物体的记录noo_mask = target_tensor[:,:,:,4] == 0# unsqueeze(-1) 扩展最后一维，用0填充，使得形状与target_tensor一样# coo_mask、noo_mask形状扩充到[32,7,7,30]# coo_mask 大部分为0   记录为1代表真实有物体的网格# noo_mask  大部分为1  记录为1代表真实无物体的网格   noo_mask的维度变为target_tensor一样的了  内容用coo_mask填充了 coo_mask = coo_mask.unsqueeze(-1).expand_as(target_tensor)noo_mask = noo_mask.unsqueeze(-1).expand_as(target_tensor)# coo_pred 取出预测结果中有物体的网格，并改变形状为（xxx,30）  xxx代表一个batch的图片上的存在物体的网格总数    # 30代表2*5+20   例如：coo_pred[72,30]# 2这段是根据coo_mask与noo_mask在特征图上提取预测框的对应特征值coo_pred = pred_tensor[coo_mask].view(-1,30)# 一个网格预测的两个box  30的前10即为2个x,y,w,h,c，并调整为（xxx,5） xxx为所有存在标注目标的网格的预测框，形如box_pred[144,5]# contiguous将不连续的数组调整为连续的数组box_pred = coo_pred[:,:10].contiguous().view(-1,5) #box[x1,y1,w1,h1,c1]# #[x2,y2,w2,h2,c2]# 每个网格预测的类别  后20class_pred = coo_pred[:,10:]# 3这段是根据coo_mask在标注特征图上提取对应的网格的特征值# 对真实标签做同样操作coo_target = target_tensor[coo_mask].view(-1,30)box_target = coo_target[:,:10].contiguous().view(-1,5)class_target = coo_target[:,10:]# 4.下面就是开始具体构建loss函数了
# # # # # # # # # # # # # # # # 本来无 预测有的损失# # # # # # # # # # # # # # # #  对应橙框# 计算不包含obj损失  即本来无，预测有 # 在预测结果中拿到真实无物体的网格，并改变形状为（xxx,30）  xxx代表一个batch的图片上的不存在物体的网格总数    30代表2*5+20   例如：[1496,30]# 根据noo_mask给出的0 1 信息来提取对应网格  一条数据代表一个网格的信息  noo_mask为 1 说明此网格真实无物体noo_pred = pred_tensor[noo_mask].view(-1,30)  #提取出标签图像上真实无物体的网格标签内容noo_target = target_tensor[noo_mask].view(-1,30)      # 例如：[1496,30]# ByteTensor：8-bit integer (unsigned)noo_pred_mask = torch.cuda.ByteTensor(noo_pred.size())   # 例如：[1496,30]noo_pred_mask.zero_()   #初始化全为0# 将第4、9  即将无obj的confidence置为1noo_pred_mask[:, 4] = 1noo_pred_mask[:, 9] = 1# 拿到第4列和第9列里面的值（即拿到真实无物体的网格中，网络预测这些网格有物体的概率值） 一行有两个值（第4和第9位） # 例如noo_pred_c：2992  noo_target_c：2992   如果有obj存在就不会取这个值noo_pred_c = noo_pred[noo_pred_mask]# 拿到第4列和第9列里面的值  真值为0，表示真实无obj（即拿到真实无物体的网格中，这些网格有物体的概率值，为0）noo_target_c = noo_target[noo_pred_mask]# 均方误差    如果 size_average = True，返回 loss.mean()。    例如noo_pred_c：2992        noo_target_c：2992# nooobj_loss 一个标量  那么这个损失函数目标就是让noo_pred_c无限接近于真值0 这个loss就能无限接近最小值0 损失函数目的达到# 想让存在的可能性越小越好nooobj_loss = F.mse_loss(noo_pred_c,noo_target_c,size_average=False)# # # # # # # # # # # # # # # # 本来有 预测有的损失# # # # # # # # # # # # # # # # #计算包含obj损失  即本来有，预测有  和  本来有，预测无coo_response_mask = torch.cuda.ByteTensor(box_target.size())coo_response_mask.zero_()coo_not_response_mask = torch.cuda.ByteTensor(box_target.size())coo_not_response_mask.zero_()# 选择最好的IOU 2个box选1个吧 for i in range(0,box_target.size()[0],2):# 预测框 2个box1 = box_pred[i:i+2]box1_xyxy = Variable(torch.FloatTensor(box1.size()))box1_xyxy[:,:2] = box1[:,:2] -0.5*box1[:,2:4]# 左上角box1_xyxy[:,2:4] = box1[:,:2] +0.5*box1[:,2:4]# 右下角#  标注框 1个box2 = box_target[i].view(-1,5)box2_xyxy = Variable(torch.FloatTensor(box2.size()))box2_xyxy[:,:2] = box2[:,:2] -0.5*box2[:,2:4]box2_xyxy[:,2:4] = box2[:,:2] +0.5*box2[:,2:4]iou = selfpute_iou(box1_xyxy[:,:4],box2_xyxy[:,:4]) #[2,1]max_iou,max_index = iou.max(0)max_index = max_index.data.cuda()coo_response_mask[i+max_index]=1 # 最大iou对应的mask 值为1 否则为0coo_not_response_mask[i+1-max_index]=1# 非最大iou对应的mask 值为1 否则为0# 1.response loss响应损失，即本来有，预测有   有相应 坐标预测的loss  （x,y,w开方，h开方）参考论文loss公式# box_pred [144,5]   coo_response_mask[144,5]   box_pred_response:[72,5]# 选择IOU最好的box来进行调整  负责检测出某物体box_pred_response = box_pred[coo_response_mask].view(-1,5)# 最佳box坐标提出来其对应的预测值box_target_response = box_target[coo_response_mask].view(-1,5)# 最佳box坐标剔除来其对应的真值# box_pred_response:[72,5]     计算预测 有物体的概率误差，返回一个数  # 存在可信度计算  box_target_response[:,4]的值为1  想让box_pred_response[:,4]存在的可能性越大越好contain_loss = F.mse_loss(box_pred_response[:,4],box_target_response[:,4],size_average=False)   # 计算（x,y,w开方，h开方）参考论文loss公式# 坐标可信度计算 loc_loss = F.mse_loss(box_pred_response[:,:2],box_target_response[:,:2],size_average=False) + F.mse_loss(torch.sqrt(box_pred_response[:,2:4]),torch.sqrt(box_target_response[:,2:4]),size_average=False)# # # # # # # # # # # # # # # # 本来有 预测无的损失# # # # # # # # # # # # # # # # # 2.not response loss 未响应损失，即本来有，预测无   未响应box_pred_not_response = box_pred[coo_not_response_mask].view(-1,5)box_target_not_response = box_target[coo_not_response_mask].view(-1,5)box_target_not_response[:,4]= 0#存在可信度计算   loss的目的是让box_pred_not_response越小越好。就是想让不存在的可能性越小越好not_contain_loss = F.mse_loss(box_pred_response[:,4],box_target_response[:,4],size_average=False)# # # # # # # # # # # # # # # # 有物体的分类损失# # # # # # # # # # # # # # # # # 3.class loss  计算传入的真实有物体的网格  分类的类别损失 class_loss = F.mse_loss(class_pred,class_target,size_average=False)
# # # # # # # # # # # # # # # # 最终的总损失# # # # # # # # # # # # # # # #         # 除以N  即平均一张图的总损失return (self.l_coord*loc_loss + contain_loss + not_contain_loss + self.l_noobj*nooobj_loss + class_loss)/N

更多推荐

yolov1的损失函数详解

本文发布于:2024-03-10 16:54:01，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1728520.html