linemod c++ 代码梳理

编程入门行业动态更新时间:2024-10-25 22:31:12

linemod c++ <a href=https://www.elefans.com/category/jswz/34/1771412.html style= 代码梳理"/>

linemod c++ 代码梳理

233-589 colorgradient map 2D

1374 high level detector api

部分参考来自 =distribute.pc_relevant.none-task-blog-baidujs_baidulandingword-1&spm=1001.2101.3001.4242

源码地址 .cpp

Feature:x,y,orientation 一个像素点和它的方向

Candidate:quantitized pyramid里面的一个struct，有一个feature和它的的score，有一个函数是把score高的feature排到前面

ColorGradient 是由modality 继承过来，作用是computes quantized gradient orientations from a color image

Modality有两个子类 2d的ColorGradient 和3d的depth normal

Modality的主要作用是通过公有函数process计算输入图片的量化后的图像金字塔  返回的是一个指向量化后的图像金字塔的指针

这里Ptr是一个智能指针(smart pointer)，随便指啥都可以

Template:储存了一个feature向量(一张template上的多个像素点)，template的h,w，以及所在的pyramid level

Template pyramid:通过typedef新定义的一组template组成的vector，本身template是由一组feature得到的vector。

Templatemap：把string映射到template pyramid的map，可能是针对2d和3d的modality有两种template pyramid

一. 训练

1.1 旋转缩放选定的template 图

rot_num = max(int((rot_deg_range[1]-rot_deg_range[0]) / rot_deg_range[2]), 1)
scale_num = max(int((scale_range[1]-scale_range[0]) / scale_range[2]), 1)
for rot_deg in np.linspace(rot_deg_range[0], rot_deg_range[1], rot_num):
for scale in np.linspace(scale_range[0], scale_range[1], scale_num):
templ_img, M = linemod.rotateTemplate(img, rot_deg, scale)

1.2 把旋转缩放之后的template 图加入Detector中

is_add_success = linemod.addTemplate(templ_img, "0", info)

template_id, rect = self.detector.addTemplate([img], class_id, mask)

1.2.1 在旋转缩放之后的template图像上得到一个 quantized pyramid

Ptr<ColorGradientPyramid> qp = modality->process(source, object_mask);

这里是通过2d的colorgradient map中的计算功能得到

这个是直接构造一个 ColorGradientPyramid 对象，返回其指针
ColorGradientPyramid 构造函数中 update(); ，内部是
quantizedOrientations(src, magnitude, angle, weak_threshold);
先做高斯模糊，然后在水平和垂直方向调用 Sobel，
调用 phase 计算梯度方向，

调用 hysteresisGradient，主要输出就是 quantized_angle
过程为：先把连续的梯度方向划分为16个区间，然后量化为8个方向
quant_r[c] &= 7; 这个代码还没看明白，这相当于把一个整数对8 求模
这么做没问题应该是因为认为 180度和190度之间的方向和0度到10度之间的方向是一个方向。

然后就是对梯度幅值超过一定阈值的像素点的 3*3 邻域求梯度直方图
投票数超过阈值的方向作为最终的量化方向

1.2.2 遍历量化角度图金字塔的每一层，计算feature point，保存到模板金字塔中

extract操作：把量化方向图(qp)的每一层提取特征点保存到输入的template指针中去

ColorGradientPyramid::extractTemplate(Template &templ)
函数输出应该是 templ.features，即提取出特征点
先对 mask 进行腐蚀， mask只在extract的时候有用到，在计算梯度图的时候没有用

Magnitude 是之前 quantizedOrientations 中计算出的梯度幅值（梯度平方和）在这个函数中也=score

对 Magnitude 搞一个遍历，对每个像素，如果 magnitude_valid 值大于0，并且其邻域(设置了nms kernel=5) 内有像素的梯度幅值超过它，那么 is_max 为 false, 如果遍历完后， is_max 为true, 那么所有邻域像素对应 magnitude_valid 值置为0，也就是说在一块梯度较大的区域中只考虑中间梯度最大的点。所有angle 不为0，且score(magnitude) >thres的像素作为candidate(x,y,score, label(angle) )加入到一个当前template的 candidate集合 candidates中。如果candidate数量少于num_feature 则return false，提取不成功。

对 candidates 按照 score 进行一次稳定排序，selectScatteredFeatures 最后从 candidates 中选取一些散得比较开的点，选取的特征点保存在 templ.features 中。这里选取的规则是：candidate按score由高到低的排序，如果新的candidate和已有的features中任意一个candidate的距离过近就把它去掉，不然就加入features当中。

selectScatteredFeatures中有一个细节是如果找到的feature数量太多大于num_features的话，就把distance_thres调高，这样留下来的feature数量会减少，直到减至features数量小于num_features，再降低distance_thres。这样就可以使得features个数只会比num_features多一点。.cpp#L151

这里quantitized pyramid封装了功能一个candidate 点集。

QuantizedPyramid：主要有两个功能函数

1.virtual bool cv::linemod::QuantizedPyramid::extractTemplate(Template & templ)const 在当前的pyramid level得到most discriminative的feature(像素点)，返回一个新的template。(移动pyramid level通过pyrDown())

2.cv::linemod::QuantizedPyramid::quantize(Mat & dst)const 在当前的pyramid level得到量化之后的图片

template pyramid是template构成的vector。Template:储存了一个feature向量(一张template上的多个像素点)，template的h,w，以及所在的pyramid level

1.2.3 对一个template的pyramid做剪切，加入所有template构成的大template pyramids中

cropTemplates(tp);

template_pyramids.push_back(tp);

return template_id;

crop 操作

这个函数先遍历每一个模板，找出特征点最大最小坐标，注意，高层次的金字塔图像的坐标会进行放大（根据层次）
得到 4个最小、最大坐标。注意：是所有层共用信息

然后再一次遍历每个模板，调整 templ.width ，templ.height ，templ.tl_x，templ.tl_y
然后用 templ.tl_x，templ.tl_y 修正了特征点坐标，所有features 的xy都是基于tl_x tl_y的位移

返回 Rect(min_x, min_y, max_x - min_x, max_y - min_y)
但外部并未接收这个返回值

template_pyramids是由所有不同角度scale的template的pyramid构成的

template_id这里是当前template_pyramids的长度

2. test

matches, _ = self.detector.match(sources=[img], threshold=threshold)

2.1 输入图片计算quantized orientation图和responce map

输入图像通过modality.process 来处理得到输入图像的方向图

quantizers.push_back(modality->process(source, mask));

2.1.1 process 操作方法

对输入图片做process = 建一个ColorGradientPyramid = update() = 用quantizedOrientations函数计算angle angle_ori magnitude

QuantizedOrientations: 1. 先做高斯模糊 2. 通过sobel计算梯度，sobel_x^2+sobel_y^2 = magnitude 3. 通过phase函数根据sobel_X和sobel_y计算角度 4. 根据初始角度和magnitude从hysteresisGradient 类中计算 quantized之后的角度

hysteresisGradient 中首先把angle_ori(angle) 转换成0~15的整型，把图像的边框一圈的angle设成0,0~15的角度再对7取余数. 对于每个quantized angle上的pixel看周围3*3的区域中是否出现多于5个相同的的方向，有的话就赋值给这个方向，不然就是0 结果保存在 pyramid的angle中在hyster的最后一步是通过>>位移操作把0~7的index转成00000001~10000000的8位编码

依次通过quantize，spread，和对8个方向计算responce map，得到的结果放到线性储存器中，lm_pyramid第一层是不同的pyramid level，第二层是8个orientation的响应图。这里把需要改变的量作为参数放进函数中。

.cpp#L206

2.1.2 spread 操作：

1. 基本跟论文里描述的一致，就是把原图在[0,T]*[0,T]的范围内平移(shift)，得到的9张图逐像素做 OR（或）操作，得到的就是spread之后的用0，1编码表示的orientation。

有点不明白的是OR操作里明明三个loop的意义是一样的为什么要分开写?

2.1.3 compute responce map

responce map是一个list，每一个是一个方向上的响应，lsb4是每个pixel在后面四个方向上的编码，也就是后面四个方向是否在当前pixel上出现，msb是前面四个方向（转成了0~15,1111的表示)。然后对于每个方向：在拉长了的像素数组上循环，对于每个像素有八个方向，把8分成前后两部分，每部分四个orientation，每部分各有16种可能情况，通过查look-up tabel(LUT)得到每部分角度与当前角度的最大相似度(cos值)，然后前后两部分取最大值。1. 如果划分成4部分的话相当于，4个部分取最大值，look-up表大小减少为(2^2 * 4 *8 = 128，每部分4种可能，四个部分，8个方向)，这样更快？ 2. 考虑到look-up并不是根据实际的cos值计算的，而是两个ori 相差一格时,simi = 3 (/4)，相差多于一格时 simi=0；这样可以让当前像素的ori（8-bits)右移当前的ori位，看是否余1，如果是 -> 4/4，不然再看前后两个。

2.1.4 linear 操作

response map本来是8*h*w 变成 memory中8*(T*T) *H/T * W/T。相当于对于一个方向的response map 按照间隔为T的间隔采点方式得到T*T张子图

2.2 matchclass中计算输入图片的8个方向的responce map和template pyramid的match

第一层在template id 上循环，之后从最低的pyramid level开始计算similarity，得到similarity后遍历查看score有没有过thres。这里lowest_T = T_at_level.back() 应该对应的是不同pyramid层的缩放率。但是为什么只有(1,4)呢

在最低的pyramid level匹配之后，进入较高的level，再从上一层生成的candidates中取出match，计算local similarity，得到局部的similarity之后不再是卡thres，而是找score最大的点，然后插入candidate，下一层再从这个最大点周围去算similarity。

这里放大的scale只是2，但是参数T不是(1,4)吗？这里lowest_lm[0]的结构是8*(T^2) *h/T *w/T (每一层的T）

2.2.1 similarity 计算细节

第一层similarity 计算的是：一个template的pyramid中的最低层和输入图像的responce map的pyramid的最底层(8*(T^2) *h/T *w/T)相似度。模板在输入图像上进行滑动产生了 similarity map。二维的相似度矩阵的宽是输入图像的宽除以T，在不同的pyramid层有不同的放大倍率，这里没考虑span_x是因为当前这一张图相当于是原图的缩小，要在下一层上找出对应的位置，所以不能考虑边缘的span_x信息

每次滑动时候的similarity score是template上的特征点和responce map上的点算出来的，实际实现的时候是先对feature循环，对于一个feature，先找到当template对上输入图的左上角时 feature点对应位置上的response值(通过accesslinearmemory确定 lm_ptr的位置)，这个指针往后移动一格，表示输出的similarity往右或者往下移一格，同时在response图上的对应点往右移T格（T是pyramid大小）。一个feature对应一张similarity子图，最后所有feature对应的similarity子图相加就是template和input image在当前pyramid level上的相似图。feature移动的范围是template position，template_position应该要和dst.ptr尽量接近？

没有实际上的down sample过程，这里实现的down sample是通过移动template的stride=T，间隔为T的算similarity。金字塔的好处是在遍历similarity图的时候从h*w变成h*w/4，遍历时间缩短

accesslinearmemory 输入是输入图片的response map pyramid的最底层(8*(T^2) *h/T *w/T) ，对于一个feature（x,y,ori)，找到原input图在这个位置上对于这个方向的响应。具体操作：response图在这里已经被reshape成了8*T^2*(H/T * W/T),首先通过feature的方向确定第一个子图，然后原图片相当于被划分成立H/T * W/T 个格子(grid），每个格子大小是T*T，这样就首先找到在grid中的位置（比如是2*2中的右下角，通过对T取余数)，然后再确定是哪一个grid（x除以T取整)，因为矩阵大小实际是T^2*(H/T * W/T)，所以只需要行指针加上列指针。在底层的时候并没有实际上的down sample的过程？

确定了第一层的similarity(H/T *W/T)后遍历确定所有大于thres的点,这里的score相当于是所有feature的平均score，感觉offset设置有点问题：设T=4,c=1 r=0，此时template的左上角在输入图上的位置应该是(T，0) ？这里x,y对应的是原始输入图片上的位置？那为什么下面要乘2

这里的

对于上一层选出来的candidate点，在进入下一层进行refine。上一层的坐标是除了

更多推荐

linemod c++ 代码梳理

本文发布于:2024-02-12 22:43:30，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1689699.html