好玩的实例分割

编程入门行业动态更新时间:2024-10-25 07:21:27

好玩的<a href=https://www.elefans.com/category/jswz/34/1771375.html style= 实例分割"/>

好玩的实例分割

好玩的实例分割-------------mask-rcnn

学习前言
什么是mask-rcnn？
mask-rcnn的优点
github
maskrcnn 实现思路
获得Proposal建议框
Proposal建议框的解码
Roi Align 层
建立classifier模型
建立mask模型
loss值计算

学习前言

有几天没写博客了，今天解读我以前阅读的一篇论文maskRCNN，顺便解读下它的代码。

什么是mask-rcnn？

mask-rcnn是何凯明大神提出的，他是基于faster-rcnn提出的two-statge算法，该方法不仅完成了目标识别，还完成了高精度的语义分割。该模型的主要思路是，在以前faster-rcnn目标识别的一层，添加了一个mask语义分割。

mask-rcnn的优点

1.采用了fpn特征金字塔。在以往的检测中，fast rcnn，ROI的作用都在最后一层，这对于大目标检测没有什么问题，但是对于小目标的检测，精度系数不够。因为对于小目标而言，当进行卷积池化到最后一层的时候，实际上的语义信息已经没有了，因为ROI映射到某个feature map的方法就是将底层坐标除以stride，显然可以理解，映射到feature map后就很小甚至没有。所以为了解决多尺度检测问题，引入了特征金字塔网络。FPN是为了自然地利用CNN层，以融合具有高分辨率的浅层layer，来具备高语义特征。下面这是一张用烂了的图。

2.采用了ROIAlign 。假定原图中有一region proposal，大小为665665，这样，映射到特征图中的大小：665/32=20.78,即20.7820.78，此时，没有像RoiPooling那样就行取整操作，而是保留浮点数。采用了双线性差值的方法，因为如果RoI Pooling的输出大小是7x7上，如果RON网络输出的RoI大小是8*8的，那么无法保证输入像素和输出像素是一一对应，首先他们包含的信息量不同（有的是1对1，有的是1对2），其次他们的坐标无法和输入对应起来。
3.引入了语义分割分支，实现了mask和class预测的关系的解耦，mask分支只做语义分割，类型预测的任务交给另一个分支。这与原本的FCN网络是不同的，原始的FCN在预测mask时还用同时预测mask所属的种类。

github

maskrcnn 实现思路

1、主干网络
在论文中主要提出：通过 ResNet+FPN 用作特征提取网络，达到 state-of-the-art 的效果。这里主要采用了利用resnet101。resnet101包括Conv block和Identity block，Conv block改变输入和输入的大小，Identity block输入和输出不变。

在stage1–c1部分采用一次普通卷积和全局平均池化，使长和宽压缩4倍，通道数变为64。
在stage2–c2部分采用一次 conv_block和两次identity_block，使长和宽压缩4倍，通道数变为256。
在stage3–c3部分采用一次 conv_block和三次identity_block，使长和宽压缩8倍，通道数变为512。
在stage4–c4部分采用一次 conv_block和22次identity_block，使长和宽压缩16倍，通道数变为1024。
在stage5–c5部分采用一次 conv_block和2次identity_block，使长和宽压缩32倍，通道数变为2048。
然后利用特征金字塔对c1,c2,c3,c4,c5进行上采样操作变为p1,p2,p3,p4,p5.
restnet101代码（左半边部分）：

from keras.layers import ZeroPadding2D,Conv2D,MaxPooling2D,BatchNormalization,Activation,Adddef identity_block(input_tensor, kernel_size, filters, stage, block,use_bias=True, train_bn=True):nb_filter1, nb_filter2, nb_filter3 = filtersconv_name_base = 'res' + str(stage) + block + '_branch'bn_name_base = 'bn' + str(stage) + block + '_branch'x = Conv2D(nb_filter1, (1, 1), name=conv_name_base + '2a',use_bias=use_bias)(input_tensor)x = BatchNormalization(name=bn_name_base + '2a')(x, training=train_bn)x = Activation('relu')(x)x = Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same',name=conv_name_base + '2b', use_bias=use_bias)(x)x = BatchNormalization(name=bn_name_base + '2b')(x, training=train_bn)x = Activation('relu')(x)x = Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c',use_bias=use_bias)(x)x = BatchNormalization(name=bn_name_base + '2c')(x, training=train_bn)x = Add()([x, input_tensor])x = Activation('relu', name='res' + str(stage) + block + '_out')(x)return xdef conv_block(input_tensor, kernel_size, filters, stage, block,strides=(2, 2), use_bias=True, train_bn=True):nb_filter1, nb_filter2, nb_filter3 = filtersconv_name_base = 'res' + str(stage) + block + '_branch'bn_name_base = 'bn' + str(stage) + block + '_branch'x = Conv2D(nb_filter1, (1, 1), strides=strides,name=conv_name_base + '2a', use_bias=use_bias)(input_tensor)x = BatchNormalization(name=bn_name_base + '2a')(x, training=train_bn)x = Activation('relu')(x)x = Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same',name=conv_name_base + '2b', use_bias=use_bias)(x)x = BatchNormalization(name=bn_name_base + '2b')(x, training=train_bn)x = Activation('relu')(x)x = Conv2D(nb_filter3, (1, 1), name=conv_name_base +'2c', use_bias=use_bias)(x)x = BatchNormalization(name=bn_name_base + '2c')(x, training=train_bn)shortcut = Conv2D(nb_filter3, (1, 1), strides=strides,name=conv_name_base + '1', use_bias=use_bias)(input_tensor)shortcut = BatchNormalization(name=bn_name_base + '1')(shortcut, training=train_bn)x = Add()([x, shortcut])x = Activation('relu', name='res' + str(stage) + block + '_out')(x)return xdef get_resnet(input_image,stage5=False, train_bn=True):# Stage 1x = ZeroPadding2D((3, 3))(input_image)x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)x = BatchNormalization(name='bn_conv1')(x, training=train_bn)x = Activation('relu')(x)# Height/4,Width/4,64C1 = x = MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)# Stage 2x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), train_bn=train_bn)x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', train_bn=train_bn)# Height/4,Width/4,256C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', train_bn=train_bn)# Stage 3x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', train_bn=train_bn)x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', train_bn=train_bn)x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', train_bn=train_bn)# Height/8,Width/8,512C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', train_bn=train_bn)# Stage 4x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', train_bn=train_bn)block_count = 22for i in range(block_count):x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i), train_bn=train_bn)# Height/16,Width/16,1024C4 = x# Stage 5if stage5:x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', train_bn=train_bn)x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', train_bn=train_bn)# Height/32,Width/32,2048C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', train_bn=train_bn)else:C5 = Nonereturn [C1, C2, C3, C4, C5]

特征金字塔部分代码（右半边）：

def get_predict_model(config):h, w = config.IMAGE_SHAPE[:2]if h / 2**6 != int(h / 2**6) or w / 2**6 != int(w / 2**6):raise Exception("Image size must be dividable by 2 at least 6 times ""to avoid fractions when downscaling and upscaling.""For example, use 256, 320, 384, 448, 512, ... etc. ")# 输入进来的图片必须是2的6次方以上的倍数input_image = Input(shape=[None, None, config.IMAGE_SHAPE[2]], name="input_image")# meta包含了一些必要信息input_image_meta = Input(shape=[config.IMAGE_META_SIZE],name="input_image_meta")# 输入进来的先验框input_anchors = Input(shape=[None, 4], name="input_anchors")# 获得Resnet里的压缩程度不同的一些层_, C2, C3, C4, C5 = get_resnet(input_image, stage5=True, train_bn=config.TRAIN_BN)# 组合成特征金字塔的结构# P5长宽共压缩了5次# Height/32,Width/32,256P5 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5)# P4长宽共压缩了4次# Height/16,Width/16,256P4 = Add(name="fpn_p4add")([UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)])# P4长宽共压缩了3次# Height/8,Width/8,256P3 = Add(name="fpn_p3add")([UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)])# P4长宽共压缩了2次# Height/4,Width/4,256P2 = Add(name="fpn_p2add")([UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)])# 各自进行一次256通道的卷积，此时P2、P3、P4、P5通道数相同# Height/4,Width/4,256P2 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2)# Height/8,Width/8,256P3 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3)# Height/16,Width/16,256P4 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4)# Height/32,Width/32,256P5 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5)# 在建议框网络里面还有一个P6用于获取建议框# Height/64,Width/64,256P6 = MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)

获得Proposal建议框

在进行建议框预测时，利用p2,p3,p4,p5,p6对RPN建议框网络进行调整，获得先验框调整参数和先验框内部是否包含物体。它首先利用33的卷积，通道数变为512.在利用2个11的卷积分别对先验框调整参数和内部是否包含物体进行预测。anchors_per_location x 4为每个先验框调整参数。anchors_per_location x 2为每一个预测框内部是否包含了物体。但是这次预测还只是粗略的预测。
代码如下：

def rpn_graph(feature_map, anchors_per_location):shared = Conv2D(512, (3, 3), padding='same', activation='relu',name='rpn_conv_shared')(feature_map)x = Conv2D(2 * anchors_per_location, (1, 1), padding='valid',activation='linear', name='rpn_class_raw')(shared)# batch_size,num_anchors,2# 代表这个先验框对应的类rpn_class_logits = Reshape([-1,2])(x)rpn_probs = Activation("softmax", name="rpn_class_xxx")(rpn_class_logits)x = Conv2D(anchors_per_location * 4, (1, 1), padding="valid",activation='linear', name='rpn_bbox_pred')(shared)# batch_size,num_anchors,4# 这个先验框的调整参数rpn_bbox = Reshape([-1,4])(x)return [rpn_class_logits, rpn_probs, rpn_bbox]

Proposal建议框的解码

在预测完，要进行对预测框的解码，使它接近真实框。
用一张解码过程用烂了的图吧

class ProposalLayer(Layer):def __init__(self, proposal_count, nms_threshold, config=None, **kwargs):super(ProposalLayer, self).__init__(**kwargs)self.config = configself.proposal_count = proposal_countself.nms_threshold = nms_threshold# [rpn_class, rpn_bbox, anchors]def call(self, inputs):# 代表这个先验框内部是否有物体[batch, num_rois, 1]scores = inputs[0][:, :, 1]# 代表这个先验框的调整参数[batch, num_rois, 4]deltas = inputs[1]# [0.1 0.1 0.2 0.2]，改变数量级deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4])# Anchorsanchors = inputs[2]# 筛选出得分前6000个的框pre_nms_limit = tf.minimum(self.config.PRE_NMS_LIMIT, tf.shape(anchors)[1])# 获得这些框的索引ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,name="top_anchors").indices# 获得这些框的得分scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),self.config.IMAGES_PER_GPU)# 获得这些框的调整参数deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),self.config.IMAGES_PER_GPU)# 获得这些框对应的先验框pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),self.config.IMAGES_PER_GPU,names=["pre_nms_anchors"])# [batch, N, (y1, x1, y2, x2)]# 对先验框进行解码boxes = utils.batch_slice([pre_nms_anchors, deltas],lambda x, y: apply_box_deltas_graph(x, y),self.config.IMAGES_PER_GPU,names=["refined_anchors"])# [batch, N, (y1, x1, y2, x2)]# 防止超出图片范围window = np.array([0, 0, 1, 1], dtype=np.float32)boxes = utils.batch_slice(boxes,lambda x: clip_boxes_graph(x, window),self.config.IMAGES_PER_GPU,names=["refined_anchors_clipped"])# 非极大抑制def nms(boxes, scores):indices = tf.image.non_max_suppression(boxes, scores, self.proposal_count,self.nms_threshold, name="rpn_non_max_suppression")proposals = tf.gather(boxes, indices)# 如果数量达不到设置的建议框数量的话# 就paddingpadding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)proposals = tf.pad(proposals, [(0, padding), (0, 0)])return proposalsproposals = utils.batch_slice([boxes, scores], nms,self.config.IMAGES_PER_GPU)return proposalsdef compute_output_shape(self, input_shape):return (None, self.proposal_count, 4)

Roi Align 层

Roi Align层对解码后的框框生固定大小的feature map，截取到固定的大小。
在下部分会对feature mapre size到77256

在上部分mask 会对feature mapre size到1414256

class PyramidROIAlign(Layer):def __init__(self, pool_shape, **kwargs):super(PyramidROIAlign, self).__init__(**kwargs)self.pool_shape = tuple(pool_shape)def call(self, inputs):# 建议框的位置boxes = inputs[0]# image_meta包含了一些必要的图片信息image_meta = inputs[1]# 取出所有的特征层[batch, height, width, channels]feature_maps = inputs[2:]y1, x1, y2, x2 = tf.split(boxes, 4, axis=2)h = y2 - y1w = x2 - x1# 获得输入进来的图像的大小image_shape = parse_image_meta_graph(image_meta)['image_shape'][0]# 通过建议框的大小找到这个建议框属于哪个特征层image_area = tf.cast(image_shape[0] * image_shape[1], tf.float32)roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area)))roi_level = tf.minimum(5, tf.maximum(2, 4 + tf.cast(tf.round(roi_level), tf.int32)))# batch_size, box_numroi_level = tf.squeeze(roi_level, 2)# Loop through levels and apply ROI pooling to each. P2 to P5.pooled = []box_to_level = []# 分别在P2-P5中进行截取for i, level in enumerate(range(2, 6)):# 找到每个特征层对应boxix = tf.where(tf.equal(roi_level, level))level_boxes = tf.gather_nd(boxes, ix)box_to_level.append(ix)# 获得这些box所属的图片box_indices = tf.cast(ix[:, 0], tf.int32)# 停止梯度下降level_boxes = tf.stop_gradient(level_boxes)box_indices = tf.stop_gradient(box_indices)# Result: [batch * num_boxes, pool_height, pool_width, channels]pooled.append(tf.image.crop_and_resize(feature_maps[i], level_boxes, box_indices, self.pool_shape,method="bilinear"))pooled = tf.concat(pooled, axis=0)# 将顺序和所属的图片进行堆叠box_to_level = tf.concat(box_to_level, axis=0)box_range = tf.expand_dims(tf.range(tf.shape(box_to_level)[0]), 1)box_to_level = tf.concat([tf.cast(box_to_level, tf.int32), box_range],axis=1)# box_to_level[:, 0]表示第几张图# box_to_level[:, 1]表示第几张图里的第几个框sorting_tensor = box_to_level[:, 0] * 100000 + box_to_level[:, 1]# 进行排序，将同一张图里的某一些聚集在一起ix = tf.nn.top_k(sorting_tensor, k=tf.shape(box_to_level)[0]).indices[::-1]# 按顺序获得图片的索引ix = tf.gather(box_to_level[:, 2], ix)pooled = tf.gather(pooled, ix)# 重新reshape为原来的格式# 也就是# Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels]shape = tf.concat([tf.shape(boxes)[:2], tf.shape(pooled)[1:]], axis=0)pooled = tf.reshape(pooled, shape)return pooleddef compute_output_shape(self, input_shape):return input_shape[0][:2] + self.pool_shape + (input_shape[2][-1],

建立classifier模型

这个模型的预测结果会调整建议框，获得最终的预测框.
代码：

def fpn_classifier_graph(rois, feature_maps, image_meta,pool_size, num_classes, train_bn=True,fc_layers_size=1024):# ROI Pooling，利用建议框在特征层上进行截取# Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels]x = PyramidROIAlign([pool_size, pool_size],name="roi_align_classifier")([rois, image_meta] + feature_maps)# Shape: [batch, num_rois, 1, 1, fc_layers_size]，相当于两次全连接x = TimeDistributed(Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"),name="mrcnn_class_conv1")(x)x = TimeDistributed(BatchNormalization(), name='mrcnn_class_bn1')(x, training=train_bn)x = Activation('relu')(x)# Shape: [batch, num_rois, 1, 1, fc_layers_size]x = TimeDistributed(Conv2D(fc_layers_size, (1, 1)),name="mrcnn_class_conv2")(x)x = TimeDistributed(BatchNormalization(), name='mrcnn_class_bn2')(x, training=train_bn)x = Activation('relu')(x)# Shape: [batch, num_rois, fc_layers_size]shared = Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2),name="pool_squeeze")(x)# Classifier head# 这个的预测结果代表这个先验框内部的物体的种类mrcnn_class_logits = TimeDistributed(Dense(num_classes),name='mrcnn_class_logits')(shared)mrcnn_probs = TimeDistributed(Activation("softmax"),name="mrcnn_class")(mrcnn_class_logits)# BBox head# 这个的预测结果会对先验框进行调整# [batch, num_rois, NUM_CLASSES * (dy, dx, log(dh), log(dw))]x = TimeDistributed(Dense(num_classes * 4, activation='linear'),name='mrcnn_bbox_fc')(shared)# Reshape to [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]mrcnn_bbox = Reshape((-1, num_classes, 4), name="mrcnn_bbox")(x)return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox

建立mask模型

def build_fpn_mask_graph(rois, feature_maps, image_meta,pool_size, num_classes, train_bn=True):# ROI Pooling，利用建议框在特征层上进行截取# Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels]x = PyramidROIAlign([pool_size, pool_size],name="roi_align_mask")([rois, image_meta] + feature_maps)# Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels]x = TimeDistributed(Conv2D(256, (3, 3), padding="same"),name="mrcnn_mask_conv1")(x)x = TimeDistributed(BatchNormalization(),name='mrcnn_mask_bn1')(x, training=train_bn)x = Activation('relu')(x)# Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels]x = TimeDistributed(Conv2D(256, (3, 3), padding="same"),name="mrcnn_mask_conv2")(x)x = TimeDistributed(BatchNormalization(),name='mrcnn_mask_bn2')(x, training=train_bn)x = Activation('relu')(x)# Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels]x = TimeDistributed(Conv2D(256, (3, 3), padding="same"),name="mrcnn_mask_conv3")(x)x = TimeDistributed(BatchNormalization(),name='mrcnn_mask_bn3')(x, training=train_bn)x = Activation('relu')(x)# Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels]x = TimeDistributed(Conv2D(256, (3, 3), padding="same"),name="mrcnn_mask_conv4")(x)x = TimeDistributed(BatchNormalization(),name='mrcnn_mask_bn4')(x, training=train_bn)x = Activation('relu')(x)# Shape: [batch, num_rois, 2xMASK_POOL_SIZE, 2xMASK_POOL_SIZE, channels]x = TimeDistributed(Conv2DTranspose(256, (2, 2), strides=2, activation="relu"),name="mrcnn_mask_deconv")(x)# 反卷积后再次进行一个1x1卷积调整通道，使其最终数量为numclasses，代表分的类x = TimeDistributed(Conv2D(num_classes, (1, 1), strides=1, activation="sigmoid"),name="mrcnn_mask")(x)return x

loss值计算

由于增加了mask分支，每个ROI的Loss函数如下所示：

代码如下：


def batch_pack_graph(x, counts, num_rows):"""Picks different number of values from each rowin x depending on the values in counts."""outputs = []for i in range(num_rows):outputs.append(x[i, :counts[i]])return tf.concat(outputs, axis=0)def smooth_l1_loss(y_true, y_pred):"""Implements Smooth-L1 loss.y_true and y_pred are typically: [N, 4], but could be any shape."""diff = K.abs(y_true - y_pred)less_than_one = K.cast(K.less(diff, 1.0), "float32")loss = (less_than_one * 0.5 * diff**2) + (1 - less_than_one) * (diff - 0.5)return lossdef rpn_class_loss_graph(rpn_match, rpn_class_logits):"""RPN anchor classifier loss.rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive,-1=negative, 0=neutral anchor.rpn_class_logits: [batch, anchors, 2]. RPN classifier logits for BG/FG."""# Squeeze last dim to simplifyrpn_match = tf.squeeze(rpn_match, -1)# Get anchor classes. Convert the -1/+1 match to 0/1 values.anchor_class = K.cast(K.equal(rpn_match, 1), tf.int32)# Positive and Negative anchors contribute to the loss,# but neutral anchors (match value = 0) don't.indices = tf.where(K.not_equal(rpn_match, 0))# Pick rows that contribute to the loss and filter out the rest.rpn_class_logits = tf.gather_nd(rpn_class_logits, indices)anchor_class = tf.gather_nd(anchor_class, indices)# Cross entropy lossloss = K.sparse_categorical_crossentropy(target=anchor_class,output=rpn_class_logits,from_logits=True)loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0))return lossdef rpn_bbox_loss_graph(config, target_bbox, rpn_match, rpn_bbox):"""Return the RPN bounding box loss graph.config: the model config object.target_bbox: [batch, max positive anchors, (dy, dx, log(dh), log(dw))].Uses 0 padding to fill in unsed bbox deltas.rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive,-1=negative, 0=neutral anchor.rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]"""# Positive anchors contribute to the loss, but negative and# neutral anchors (match value of 0 or -1) don't.rpn_match = K.squeeze(rpn_match, -1)indices = tf.where(K.equal(rpn_match, 1))# Pick bbox deltas that contribute to the lossrpn_bbox = tf.gather_nd(rpn_bbox, indices)# Trim target bounding box deltas to the same length as rpn_bbox.batch_counts = K.sum(K.cast(K.equal(rpn_match, 1), tf.int32), axis=1)target_bbox = batch_pack_graph(target_bbox, batch_counts,config.IMAGES_PER_GPU)loss = smooth_l1_loss(target_bbox, rpn_bbox)loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0))return lossdef mrcnn_class_loss_graph(target_class_ids, pred_class_logits,active_class_ids):"""Loss for the classifier head of Mask RCNN.target_class_ids: [batch, num_rois]. Integer class IDs. Uses zeropadding to fill in the array.pred_class_logits: [batch, num_rois, num_classes]active_class_ids: [batch, num_classes]. Has a value of 1 forclasses that are in the dataset of the image, and 0for classes that are not in the dataset."""# During model building, Keras calls this function with# target_class_ids of type float32. Unclear why. Cast it# to int to get around it.target_class_ids = tf.cast(target_class_ids, 'int64')# Find predictions of classes that are not in the dataset.pred_class_ids = tf.argmax(pred_class_logits, axis=2)# TODO: Update this line to work with batch > 1. Right now it assumes all#       images in a batch have the same active_class_idspred_active = tf.gather(active_class_ids[0], pred_class_ids)# Lossloss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=target_class_ids, logits=pred_class_logits)# Erase losses of predictions of classes that are not in the active# classes of the image.loss = loss * pred_active# Computer loss mean. Use only predictions that contribute# to the loss to get a correct mean.loss = tf.reduce_sum(loss) / tf.reduce_sum(pred_active)return lossdef mrcnn_bbox_loss_graph(target_bbox, target_class_ids, pred_bbox):"""Loss for Mask R-CNN bounding box refinement.target_bbox: [batch, num_rois, (dy, dx, log(dh), log(dw))]target_class_ids: [batch, num_rois]. Integer class IDs.pred_bbox: [batch, num_rois, num_classes, (dy, dx, log(dh), log(dw))]"""# Reshape to merge batch and roi dimensions for simplicity.target_class_ids = K.reshape(target_class_ids, (-1,))target_bbox = K.reshape(target_bbox, (-1, 4))pred_bbox = K.reshape(pred_bbox, (-1, K.int_shape(pred_bbox)[2], 4))# Only positive ROIs contribute to the loss. And only# the right class_id of each ROI. Get their indices.positive_roi_ix = tf.where(target_class_ids > 0)[:, 0]positive_roi_class_ids = tf.cast(tf.gather(target_class_ids, positive_roi_ix), tf.int64)indices = tf.stack([positive_roi_ix, positive_roi_class_ids], axis=1)# Gather the deltas (predicted and true) that contribute to losstarget_bbox = tf.gather(target_bbox, positive_roi_ix)pred_bbox = tf.gather_nd(pred_bbox, indices)# Smooth-L1 Lossloss = K.switch(tf.size(target_bbox) > 0,smooth_l1_loss(y_true=target_bbox, y_pred=pred_bbox),tf.constant(0.0))loss = K.mean(loss)return lossdef mrcnn_mask_loss_graph(target_masks, target_class_ids, pred_masks):"""Mask binary cross-entropy loss for the masks head.target_masks: [batch, num_rois, height, width].A float32 tensor of values 0 or 1. Uses zero padding to fill array.target_class_ids: [batch, num_rois]. Integer class IDs. Zero padded.pred_masks: [batch, proposals, height, width, num_classes] float32 tensorwith values from 0 to 1."""# Reshape for simplicity. Merge first two dimensions into one.target_class_ids = K.reshape(target_class_ids, (-1,))mask_shape = tf.shape(target_masks)target_masks = K.reshape(target_masks, (-1, mask_shape[2], mask_shape[3]))pred_shape = tf.shape(pred_masks)pred_masks = K.reshape(pred_masks,(-1, pred_shape[2], pred_shape[3], pred_shape[4]))# Permute predicted masks to [N, num_classes, height, width]pred_masks = tf.transpose(pred_masks, [0, 3, 1, 2])# Only positive ROIs contribute to the loss. And only# the class specific mask of each ROI.positive_ix = tf.where(target_class_ids > 0)[:, 0]positive_class_ids = tf.cast(tf.gather(target_class_ids, positive_ix), tf.int64)indices = tf.stack([positive_ix, positive_class_ids], axis=1)# Gather the masks (predicted and true) that contribute to lossy_true = tf.gather(target_masks, positive_ix)y_pred = tf.gather_nd(pred_masks, indices)# Compute binary cross entropy. If no positive ROIs, then return 0.# shape: [batch, roi, num_classes]loss = K.switch(tf.size(y_true) > 0,K.binary_crossentropy(target=y_true, output=y_pred),tf.constant(0.0))loss = K.mean(loss)return loss

更多推荐

好玩的实例分割

本文发布于:2024-03-23 17:58:19，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1741117.html