《Tracking without bells and whistles》翻译和笔记|电子爱好者

admin管理员组
文章数量:1623788

机器翻译，日后再核对.
这篇文章比较别扭。慢慢理解。

摘要

The problem of tracking multiple objects in a video sequence poses several challenging tasks. For tracking-by-detection, these include object re-identification, motion prediction and dealing with occlusions. We present a tracker (without bells and whistles) that accomplishes tracking without specifically targeting any of these tasks, in particular, we perform no training or optimization on tracking data. To this end, we exploit the bounding box regression of an object detector to predict the position of an object in the next frame, thereby converting a detector into a Tracktor. We demonstrate the potential of Tracktor and provide a new state-of-the-art on three multi-object tracking benchmarks by extending it with a straightforward re-identification and camera motion compensation.
We then perform an analysis on the performance and failure cases of several state-of-the-art tracking methods in comparison to our Tracktor. Surprisingly, none of the dedicated tracking methods are considerably better in dealing with complex tracking scenarios, namely, small and occluded objects or missing detections. However, our approach tackles most of the easy tracking scenarios. Therefore, we motivate our approach as a new tracking paradigm and point out promising future research directions. Overall, Tracktor yields superior tracking performance than any current tracking method and our analysis exposes remaining and unsolved tracking challenges to inspire future research directions.

在一个视频序列中跟踪多个目标包含几个有挑战性的任务。就“基于检测的跟踪”这个技术而言，它包括目标再识别、运动预测以及处理遮挡。我们提出一种（没有任何额外设计的）追踪算法，在不专门针对这些目标设计的前提下，实现目标的追踪。特别是，我们不对追踪数据做任何的训练或者优化。为此，我们利用对象检测器的包围框回归来预测下一帧中对象的位置，从而将检测器转换为跟踪器。我们展示了Tracktor的潜力，通过直接的重新识别和摄像机运动补偿对其进行扩展，在三个多对象跟踪基准上取得了最优的性能。

然后，我们对几种最先进的跟踪方法的性能和故障案例进行分析，并与我们的跟踪器进行比较。出人意料的是，在处理复杂的跟踪方案（即较小的物体和被遮挡的物体或缺少检测物）时，没有一种专用的跟踪方法会更好。但是，我们的方法可以解决大多数容易跟踪的情况。因此，我们将我们的方法作为一种新的跟踪范例进行激励，并指出有希望的未来研究方向。总体而言，Tracktor的跟踪性能要优于任何当前的跟踪方法，并且我们的分析揭示了仍存在和尚未解决的跟踪挑战，以激发未来的研究方向。

Scene understanding from video remains one of the big challenges of computer vision. Humans are often the center of attention in a scene, which leads to the fundamental problem of detecting and tracking them in a video. Tracking-bydetection has emerged as the preferred paradigm to solve the problem of tracking multiple objects as it simplifies the task by breaking it into two steps: (i) detecting object locations independently in each frame, (ii) form tracks by linking corresponding detections across time. The linking step,or data association, is a challenging task on its own, due to missing and spurious detections, occlusions, and target interactions in crowded environments. To address these issues, research in this area has produced increasingly complex models achieving only marginally better results, e.g., multiple object tracking accuracy has only improved 2.4% in the last two years on the MOT16 [45] benchmark.

从视频中理解场景仍然是计算机视觉的一大挑战。而人通常是场景中关注的焦点，这就导致了（人员的检测和跟踪）是计算机视觉的一个基本问题。通过检测进行跟踪已成为解决多目标跟踪问题的首选范式，因为它将任务简化为两个步骤：1）在每一帧中独立检测目标位置；2）通过跨时间连接相应的检测来形成轨迹。由于缺失和虚假检测、遮挡和拥挤环境中的目标交互，连接步骤或数据关联本身就是一项具有挑战性的任务。为了解决这些问题，该领域的研究已经产生了越来越复杂的模型，但只能获得略微更好的结果，例如，在过去两年里，以MOT16[45]为基准的多目标跟踪精度仅提高了2.4%。

In this paper, we push tracking-by-detection to the limit by using only an object detection method to perform tracking. We show that one can achieve state-of-the-art tracking results by training a neural network only on the task of detection. As indicated by the blue arrows in Figure 1, the regressor of an object detector such as Faster-RCNN [52] is sufficient to construct object trajectories in a multitude of challenging tracking scenarios. This raises an interesting question that we discuss in this paper: If a detector can solve most of the tracking problems, what are the real situations where a dedicated tracking algorithm is necessary? We hope our work and the presented Tracktor allows researchers to focus on the still unsolved critical challenges of multi-object tracking.

在本文中，我们通过只使用一种目标检测方法来进行跟踪，从而将检测跟踪的性能推向极限。我们的工作表明，仅通过对检测任务进行训练的神经网络，就可以实现最新的跟踪结果。如图1中的蓝色箭头所示，物体检测器(如Faster-RCNN[52])的回归器足以在众多具有挑战性的跟踪场景中构建物体轨迹。这就引出了我们在本文中讨论的一个有趣的问题：如果检测器可以解决大多数跟踪问题，那么什么情况下需要专用的跟踪算法呢？我们希望我们的工作和提出的Tracktor能够使研究人员专注于尚未解决的多对象跟踪的关键挑战。

图 1. Tracktor提出实现多目标跟踪只有一个对象探测器，由两个主要处理步骤，分别用蓝色和红色表示。对于一个给定的第 t t t 帧，首先，目标的回归探测器将已经在第 t − 1 t-1 t−1帧存在的跟踪框架的边框 B t − 1 k \mathcal{B}^k_{t-1} Bt−1k 在第 t t t 帧里找到个新位置。将新边界框的位置相应的目标分类得分 s t k s_t^k stk 用于将可能被遮挡的追踪清除出去。第二，目标检测器（或给定的公共检测集）提供了一组第 t t t 帧的检测结果 D t \mathcal{D}^t Dt ；最后，如果新的检测结果和之前追踪到的目标集的任何目标的IoU不够大，那则认为是个新的目标出现。

This paper presents four main contributions:

We introduce the Tracktor which tackles multi-object tracking by exploiting the regression head of a detector to perform temporal realignment of object bounding boxes.
We present two simple extensions to Tracktor, a reidentification Siamese network and a motion model. The resulting tracker yields state-of-the-art performance in three challenging multi-object tracking benchmarks.
We conduct a detailed analysis on failure cases and challenging tracking scenarios, and show none of the dedicated tracking methods perform substantially better than our regression approach.
We propose our method as a new tracking paradigm which exploits the detector and allows researchers to focus on the remaining complex tracking challenges. This includes an extensive study on promising future research directions.

本文的贡献主要有4个方面：

我们引入了跟踪器，该跟踪器通过利用检测器的回归头来执行对象边界框的时间重新对齐，从而解决了多对象跟踪问题。
我们提出了Tracktor的两个简单的扩展，一个再识别的孪生网络和一个运动模型。由此产生的跟踪器在三个具有挑战性的多目标跟踪基准中达到最先进的性能。
我们对失败案例和具有挑战性的跟踪场景进行了详细的分析，结果表明，没有一种专用的跟踪方法比我们的回归方法具有更好的性能。
我们提出了作为检测器的一种新的跟踪范式，该方法可以利用检测器并使研究人员专注于剩余的复杂跟踪挑战。这包括对有前途的未来研究方向的广泛研究。

Detector足矣

We propose to convert a detector into a Tracktor performing multiple object tracking. Several CNN-based detection algorithms [52, 63] contain some form of bounding box refinement through regression. We propose an exploitation of such a regressor for the task of tracking. This has two key advantages: (i) we do not require any tracking specific training, and (ii) we do not perform any complex optimization at test time, hence our tracker is online. Furthermore, we show that our method achieves state-of-the-art performance on several challenging tracking scenarios.

我们提出将探测器转换为跟踪器使用来执行多目标跟踪。一些基于CNN的检测算法[52,63]包含了通过回归来改进边界框的某种形式。我们利用这种回归因子来完成跟踪任务。这有两个关键优势：1）我们不需要进行任何专为追踪的训练，以及 2）我们在测试时不执行任何复杂的优化，因此我们的跟踪器是在线的。此外，我们表明我们的方法在几个有挑战性的跟踪场景下达到了最先进的性能。

目标检测器

The core element of our tracking pipeline is a regressionbased detector. In our case, we train a Faster R-CNN [52] with ResNet-101 [22] and Feature Pyramid Networks (FPN) [41] on the MOT17Det [45] pedestrian detection dataset.

我们的跟踪算法核心元素是基于回归的检测器。在我们的案例中，我们在在MOT17Det[45]行人检测数据集上训练由ResNet-101[22]和特征金字塔网络(FPN)[41]组成的Faster R-CNN[52]。

To perform object detection, Faster R-CNN applies a Region Proposal Network to generate a multitude of bounding box proposals for each potential object. Feature maps for each proposal are extracted via Region of Interest (RoI) pooling [21], and passed to the classification and regression heads. The classification head assigns an object score to the proposal, in our case, it evaluates the likelihood of the proposal showing a pedestrian. The regression head refines the bounding box location tightly around an object. The detector yields the final set of object detections by applying non-maximum-suppression (NMS) to the refined bounding box proposals. Our presented method exploits the aforementioned ability to regress and classify bounding boxes to perform multi-object tracking.

为了进行对象检测，Faster R-CNN应用区域提名网络为每个潜在的对象生成大量的目标包围框建议。每个建议的特征映射通过[21]池的感兴趣区域(RoI)提取，并传递给分类和回归头。分类头给建议分配一个目标分数，在我们的案例中，它评估提案显示行人的可能性。回归头部细化了紧围绕一个对象的边界框位置。检测器通过应用非最大抑制(NMS)来获得最终的目标检测集。我们提出的方法利用上述的能力，回归和分类边界框，以执行多目标跟踪。

跟踪器

The challenge of multi-object tracking is to extract the spatial and temporal positions, i.e., trajectories, of k k k objects given a frame by frame video sequence. Such a trajectory is defined as a list of ordered object bounding boxes T k = { b t 1 k , b t 2 k , . . . } \mathcal{T}_k = \left\{\mathbf{b}^k_{t_1},\mathbf{b}^k_{t_2}, ... \right\} Tk={bt1k,bt2k,...}, where a bounding box is defined by its coordinates b t k = ( x , y , w , h ) \mathbf{b}^k_{t}=(x, y, w, h) btk=(x,y,w,h), and $t $ represents a frame of the video. We denote the set object bounding boxes in frame t t t with B k = { b t k 1 , b t k 2 , . . . } \mathcal{B}_k = \left\{\mathbf{b}^{k_1}_{t}, \mathbf{b}^{k_2}_{t}, ... \right\} Bk={btk1,btk2,...}. Note, that each T k \mathcal{T}_k Tk or B t \mathcal{B}_t Bt can contain less elements than the total number of frames or trajectories in a sequence, respectively. At t = 0 t = 0 t=0, our tracker initializes tracks from the first set of detections D 0 = { d 0 1 , d 0 2 , . . . } = B 0 \mathcal{D}_0 = \left\{\mathbf{d}^{1}_{0}, \mathbf{d}^{2}_{0}, ... \right\}=\mathcal{B}_0 D0={d01,d02,...}=B0. In Figure 1, we illustrate the two subsequent processing steps (the nuts and bolts of our method) for a given frame t for all t > 0 t > 0 t>0, namely, the bounding box regression and track initialization.

多目标跟踪的挑战是，给定一帧一帧的视频序列，从中提取 k k k 个目标的空间和时间位置，即轨迹。这样的轨迹被定义为有序的目标包围框 T k = { b t 1 k , b t 2 k , . . . } \mathcal{T}_k = \left\{\mathbf{b}^k_{t_1},\mathbf{b}^k_{t_2}, ... \right\} Tk={bt1k,bt2k,...}，其中边界框由坐标 b t k = ( x , y , w , h ) \mathbf{b}^k_{t}=(x, y, w, h) btk=(x,y,w,h) 定义， t t t 表示视频的一帧。我们用 B k = { b t k 1 , b t k 2 , . . . } \mathcal{B}_k = \left\{\mathbf{b}^{k_1}_{t}, \mathbf{b}^{k_2}_{t}, ... \right\} Bk={btk1,btk2,...}。注意，每个 T k \mathcal{T}_k Tk或 B t \mathcal{B}_t Bt 所包含的元素可以分别小于序列中帧或轨迹的总数。在 t = 0 t = 0 t=0 时，我们的跟踪器从第一组检测开始初始化跟踪 D 0 = { d 0 1 , d 0 2 , . . . } = B 0 \mathcal{D}_0 = \left\{\mathbf{d}^{1}_{0}, \mathbf{d}^{2}_{0}, ... \right\}=\mathcal{B}_0 D0={d01,d02,...}=B0。在图1中，我们为所有 t > 0 t>0 t>0，即边界框回归和跟踪初始化。

Bounding box regression. The first step, denoted with blue arrows, exploits the bounding box regression to extend active trajectories to the current frame t t t. This is achieved by regressing the bounding box b t − 1 k \mathbf{b}^{k}_{t-1} bt−1k of frame t − 1 t − 1 t−1 to the object’s new position b t k \mathbf{b}^{k}_{t} btk at frame t t t. In the case of Faster R-CNN, this corresponds to applying RoI pooling on the features of the current frame but with the previous bounding box coordinates. Our assumption is that the target has moved only slightly between frames, which is usually ensured from high frame rates (see Section B.5 of the supplementary for a frame rate robustness evaluation of Tracktor).The identity is automatically transferred from the previous to the regressed bounding box, effectively creating a trajectory. This is repeated for all subsequent frames.
After the bounding box regression, our tracker considers two cases for killing (deactivating) a trajectory: (i) an object leaving the frame or occluded by a non-object is killed if its new classification score s t k s^{k}_{t} stk is below σ a c t i v e \sigma_{active} σactive and (ii) occlusions between objects are handled by applying nonmaximum suppression (NMS) to all remaining B t \mathcal{B}_t Bt and their corresponding scores with an Intersection over Union (IoU) threshold λ a c t i v e \lambda_{active} λactive.

边界框回归。第一步（蓝色箭头所示）利用边界框回归将活动轨迹扩展到当前帧 t t t。这是通过将 t − 1 t-1 t−1 时刻的边界框 b t − 1 k \mathbf{b}^{k}_{t-1} bt−1k 回归到对象在 t t t 帧的新位置 b t k \mathbf{b}^{k}_{t} btk 来实现的。在更快的R-CNN的情况下，这相当于在当前帧的特征上应用RoI池，但使用之前的边框坐标。我们的假设是目标在帧间仅轻微移动，这通常是由高帧率来保证的(关于Tracktor帧率鲁棒性评估的补充章节B.5)。标识自动从之前的边界框转移到回归的边界框，有效地创建了一个轨迹。这对所有随后的帧都是重复的。
边界框回归之后,我们的追踪器考虑两种销毁追踪轨迹的方法：1）目标移动到画面之外，或被其他物体遮挡，此时它新的分类 s t k s^{k}_{t} stk 得分低于 σ a c t i v e \sigma_{active} σactive ；2）目标之间发生互相遮挡，这种情况将所有剩余 B t \mathcal{B}_t Bt 及其对应的分数与一个IoU 阈值 λ a c t i v e \lambda_{active} λactive 做一个非极大值抑制(NMS) 。

Bounding box initialization. In order to account for new targets, the object detector also provides the detections D t \mathcal{D}_t Dt for the entire frame t t t. This second step, indicated in Figure 1 with red arrows, is analogous to the first initialization at t = 0 t = 0 t=0. But a detection from D t \mathcal{D}_t Dt starts a trajectory only if the IoU with any of the already active trajectories b t k \mathbf{b}^k_t btk is smaller than λ n e w \lambda_{new} λnew. That is, we consider a detection for a new trajectory only if it is covering a potentially new object that is not explained by any trajectory. It should be noted again that our Tracktor does not require any tracking specific training or optimization and solely relies on an object detection method. This allows us to directly benefit from improved object detection methods and, most importantly, enables a comparatively cheap transfer to different tracking datasets or scenarios in which no ground truth tracking but only detection data is available.

边界框的初始化。为了能记录新的目标，目标检测器还给出第 t t t 帧的检测结果 D t \mathcal{D}_t Dt 。这是第二步，如图1中红色箭头所示，类似于 t = 0 t = 0 t=0 时的第一个初始化。但是 D t \mathcal{D}_t Dt 的检测只有当任何一个已经活动的轨道 b t k \mathbf{b}^k_t btk 的IoU小于 λ n e w \lambda_{new} λnew 时才会开始一个轨道。也就是说，只有在当前的检测结果中的目标没有任何一个轨迹能和它匹配的时候才会开一个新的轨迹。需要再次指出的是，我们的跟踪器不需要针对跟踪进行专门的训练或者优化，它只依赖于目标检测方法。这使我们能够直接受益于改进的目标检测方法，最重要的是，能够相对廉价地传输到不同的跟踪数据集或场景，在这些场景中，没有标记的信息跟踪信息，只有用于目标检测的数据。

跟踪扩展

In this section, we present two straightforward extensions to our vanilla Tracktor: a motion model and a reidentification algorithm. Both are aimed at improving identity preservation across frames and are common examples of techniques used to enhance, e.g., graph-based tracking methods [39, 62, 35].

在本节中，我们将对我们的原生跟踪器进行两个简单的扩展：运动模型和再识别算法。两者都旨在改进在不同的帧间实现身份保存，这两者都是增强技术的一般例子，用来提高例如基于图的跟踪方法[39,62,35]的性能。

Motion model. Our previous assumption that the position of an object changes only slightly from frame to frame does not hold in two scenarios: large camera motion and low video frame rates. In extreme cases, the bounding boxes from frame t − 1 t - 1 t−1 might not contain the tracked object in frame t t t at all. Therefore, we apply two types of motion models that will improve the bounding box position in future frames. For sequences with a moving camera, we apply a straightforward camera motion compensation (CMC) by aligning frames via image registration using the Enhanced Correlation Coefficient (ECC) maximization as introduced in [16]. For sequences with comparatively low frame rates, we apply a constant velocity assumption (CVA) for all objects as in [11, 2].

运动模型。 我们之前的假设是，物体的位置从一帧到另一帧只是轻微的变化，这在两种情况下这个假设符合实际：一是摄像机抖动很厉害，二是视频采集的帧率很低。在极端情况下，第 t − 1 t - 1 t−1 帧的画面里可能根本不包含第 t t t 帧中被跟踪的对象。因此，我们应用了两种运动模型来改善边框在未来帧未来帧中的位置。对于摄像机大抖动的视频，我们使用[16]中引入的增强相关系数(ECC)最大化算法对帧间图像进行配准，来直接补偿相机抖动产生的影响。对于帧率相对较低的视频，我们对所有对象应用恒定速度假设(CVA)，如[11,2]。

Re-identification. In order to keep our tracker online, we suggest a short-term re-identification (reID) based on appearance vectors generated by a Siamese neural network [6, 25, 54]. To that end, we store killed (deactivated) tracks in their non-regressed version b t − 1 k \mathbf{b}^k_{t−1} bt−1k for a fixed number of F r e I D F_{reID} FreID frames. We then compare the distance in the embedding space of the deactivated with the newly detected tracks and re-identify via a threshold. The embedding space distance is computed by a Siamese CNN and appearance feature vectors for each of the bounding boxes. It should be noted that the reID network is indeed trained on tracking ground truth data. To minimize the risk of false reIDs, we only consider pairs of deactivated and new bounding boxes with a sufficiently large IoU. The motion model is continuously applied to the deactivated tracks.

重新识别。 为了保持我们的跟踪器在线，我们提出基于用孪生神经网络生成的外观向量进行短期再识别(reID)[6,25,54]的方法。为此，我们在其非回归版本 b t − 1 k \mathbf{b}^k_{t−1} bt−1k 中为一定数量（ F r e I D F_{reID} FreID）的帧画面存储被清理（停用）的追踪目标。然后我们将未激活的追踪目标与新检测到的目标在嵌入空间中的距离进行比较，并通过阈值进行重新识别。嵌入空间距离由一个孪生CNN和每个包围框的外观特征向量计算。值得注意的是，reID网络确实是在标记过的跟踪数据上进行训练的。为了最小化虚假reIDs的风险，只有未激活和新的边界框之间有足够大的IoU才考虑匹配。这个运动模型持续在停用的目标上使用。

实验

消融研究（Ablation study）

Ablation study：去掉模型或者算法中的一些特征，观察去除的特征对模型产生了什么影响。即，ablation study就是你在同时提出多个思路提升某个模型的时候，为了验证这几个思路分别都是有效的，做的控制变量实验的工作。

基准评估

…

分析

The superior performance of our tracker without any tracking specific training or optimization demands a more thorough analysis. Without sophisticated tracking methods, it is not expected to excel in crowded and occluded, but rather only in benevolent, tracking scenarios. Which begs the question whether more common tracking methods fail to specifically address these complex scenarios as well. Our experiments and the subsequent analysis ought to demonstrate the strengths of our approach for easy tracking scenarios and motivate future research to focus on remaining complex tracking problems. In particular, we question the common execution of tracking-by-detection and suggest a new tracking paradigm. The subsequent analysis is conducted on the MOT17 training data and we compare all top performing methods with publicly shared data.

我们的跟踪器性能优越，无需专门针对跟踪进行任何的训练或者优化。（为什么这样）需要更深入的分析。不使用复杂的跟踪方法，它就不可能在拥挤和闭塞的情况下表现出色，而只是在非常好的跟环境下（得到优越的性能）。这就引出了一个问题，更常见的跟踪方法是否也不能专门解决这些复杂的场景。我们的实验和随后的分析应该能够证明我们的方法在简单跟踪场景中的优势，并激励未来的研究者将精力集中解决其他的复杂跟踪问题上。特别指出，我们对逐个检测跟踪的常见执行方式提出了质疑，并提出了一种新的跟踪范式。接下来的分析是在MOT17训练数据上进行的，我们将所有表现最好的方法与公开共享的数据进行比较。

跟踪的挑战

For a better understanding of our tracker, we want to analyse challenging tracking scenarios and compare its strengths and weaknesses to other trackers. To this end, we summarize their fundamental characteristics in Table 3. For a better understanding of our tracker, we want to analyse challenging tracking scenarios and compare its strengths and weaknesses to other trackers. To this end, we summarize their fundamental characteristics in Table 3.

为了更好地理解我们的跟踪器，我们想分析具有挑战性的跟踪场景，并将其与其他跟踪器的优缺点进行比较。为此，我们在表3中总结了它们的基本特征。为了更好地理解我们的跟踪器，我们想分析具有挑战性的跟踪场景，并将其与其他跟踪器的优缺点进行比较。为此，我们在表3中总结了它们的基本特征。

Object visibility. Intuitively, we expect diminished tracking performance for object-object or object-non-object occlusions, i.e., for targets with diminished visibility. In Figure 2, we compare the ratio of successfully tracked bounding boxes with respect to their visibility. The transparent red bar indicates the occurrences of ground truth bounding boxes for each visibility, and illustrates the proportionate impact on the overall performance of the trackers. Our method achieves superior performance even for partially occluded bounding boxes with visibilities as low as 0.3. Neither the identify preserving aspects of MHT DAM and MOTDT17 [9] nor the offline interpolation capabilities of MHT DAM and jCC seem to successfully tackle highly occluded objects. The high MOTA values in Table 2 are largely due to the unbalanced distribution of ground truth visibilities. As expected, our extended version only achieves minor improvements over our vanilla Tracktor.

对象的可见性。 直观地说，我们期望对物体-物体或物体-非物体遮挡的跟踪性能降低，即对能见度降低的目标。在图2中，我们比较了成功跟踪的边界框与其可见性的比率。透明的红色条表示每个可见性出现的地面真值边界框，并说明对跟踪器整体性能的比例影响。我们的方法即使对于能见度低至0.3的部分闭塞的包围框也能获得卓越的性能。MHT DAM和MOTDT17[9]的识别保持功能以及MHT DAM和jCC的离线插值能力似乎都不能成功地处理高度遮挡的物体。表2中的高MOTA值很大程度上是由于地真能见度分布不平衡造成的。正如预期的那样，我们的扩展版本只实现了我们的原生跟踪器的小改进。

Object size. In view of the large fraction of visible but not tracked objects in Figure 2, we argue that the trackability of an object is not only dependent on its visibility, but also its size. Therefore, we conduct the same comparison as for the visibility but for the size of an object. In the first row of Figure 3, we assume the height of a pedestrian to be proportional to its size and compare on all three MOT17 public detection sets. All methods performed similarly well for object heights larger than 250 pixels. To demonstrate their shortcomings even for highly visible objects, we only compare objects with a visibility larger than 0.9. As expected, the trackability of an object decreases drastically with its size across all three detection sets. Our tracker shows its strength in compensating for insufficient DPM and Faster R-CNN detections for all object sizes. All methods except MOTDT17 benefit from the additional small detections provided by SDP. For our tracker this is largely due to the Feature Pyramid Network extension of our Faster-RCNN detector. However, the learned appearance model and reID of the online MOTDT17 method seem generally vulnerable to small detections. Appearance models generally suffer from small object sizes and few observed pixels. In conclusion, except from our compensation of inferior detections none of the trackers exhibit a notably better performance with respect to varying object sizes.

目标尺寸。 鉴于图2中大量可见但未被跟踪的对象，我们认为目标的可跟踪性不仅取决于其可见性，还取决于其大小。因此，我们对可见性进行同样的比较，只是对目标的大小进行比较。在图3的第一行中，我们假设行人的高度与其大小成比例，并在所有三个MOT17公共检测集上进行比较。所有方法在物体高度大于250像素的情况下都表现得很好。为了证明它们的缺点，即使是在高可见的目标上，我们只比较了能见度大于0.9的目标。正如预期的那样，在所有三个检测集上，目标的可跟踪性随其大小急剧下降。我们的跟踪器显示了它在补偿DPM不足和更快的R-CNN检测所有目标大小方面的优势。除了modt17之外，所有的方法都受益于SDP提供的额外的小检测。对于我们的跟踪器，这主要是由于我们的Faster-RCNN检测器的金字塔网络扩展特性。然而，在线MOTDT17方法的学习到的外观模型和reID似乎通常容易受到小的检测。外观模型通常存在目标尺寸小和观测像素少的问题。总之，除了我们对劣等检测的补偿之外，没有一个跟踪器在不同的目标大小方面表现出明显更好的性能。

Robustness to detections. The performance of trackingby-detection methods with respect to visibility and size is inherently limited by the robustness of the underlying detection method. However, as observed for the object size, trackers differ in their ability to cope with, or benefit from, varying quality of detections. In the second row of Figure 3, we quantify this ability in terms of detection gaps on their coverage by the tracker. We define a detection gap as part of a ground truth trajectory that was at least once detected, and compare coverage of each gap vs. the gap length. Intuitively, long gaps are harder to compensate for, as the online or offline tracker has to perform a longer hallucination or interpolation, respectively. We indicated the occurrences of gap lengths over the respective set of detections in transparent red. For DPM and Faster R-CNN detections, two solutions lead to notable gap coverage: (i) offline interpolation such as in jCC, or (ii) motion prediction with Kalman filter and reID as in MOTDT. Compared to the graph-based jCC method, the online MOTDT17 method excels at covering particularly long gaps. However, none of these dedicated tracking methods yields similar robustness to our frame by frame regression tracker, which achieves far superior coverage. This holds especially true for long detection gaps with more than 15 frames. Offline methods benefit the most from improved SDP detections and neither our nor the MOTDT17 tracker convince with a notable gap length robustness.

鲁棒性检测。 基于可见性和大小的跟踪检测方法的性能天生受到底层检测方法的鲁棒性的限制。然而，正如观察到的对象大小，跟踪器的能力不同，以应对，或受益于不同的质量的检测。在图3的第二行中，我们根据跟踪器覆盖范围上的检测缺口来量化这种能力。我们将检测间隙定义为至少检测过一次的地面真实轨迹的一部分，并将每个间隙的覆盖范围与间隙长度进行比较。直觉上，较长的间隙很难补偿，因为在线跟踪器和离线跟踪器必须分别执行较长的幻觉或插值。我们用透明红色表示了间隙长度在各自检测集上的发生情况。对于DPM和更快的R-CNN检测，两种解决方案导致显著的间隙覆盖:(i)离线插值，如在jCC，或(ii)运动预测与卡尔曼滤波器和reID，如在MOTDT。与基于图形的jCC方法相比，在线的MOTDT17方法在覆盖特别长的空白方面表现出色。然而，这些专门的跟踪方法都没有产生类似于我们的逐帧回归跟踪器的鲁棒性，它实现了远远优越的覆盖。这对于超过15帧的长时间检测间隔尤其正确。离线方法从改进的SDP检测中获益最大，并且无论是我们的还是MOTDT17跟踪器都没有令人信服的间隙长度鲁棒性。

Identity preservation. The results of our Tracktor++ summarized in Table 2 indicate an identity preservation performance in terms of IDF1 and identity switches comparable with dedicated tracking methods. This is achieved without any offline graph optimization as in jCC [30] or eHAF [58]. In particular, MOTDT17, which applies a sophisticated appearance model and reID, is not substantially superior to our regression tracker and its comparatively simple extensions. However, our method excels in reducing the number of false positives in MOT17 as well as MOT16. In addition, we have shown that our Tracktor is capable of incorporating additional identity preserving extension.

身份保护。表2中总结的Tracktor++的结果表明，在IDF1和身份切换方面，与专用跟踪方法相比，身份保存性能相当。这是在没有任何离线图优化的情况下实现的，如jCC[30]或eHAF[58]。特别是，modt17，它应用了一个复杂的外观模型和reID，并没有实质上优于我们的回归跟踪器和它相对简单的扩展。然而，我们的方法在减少mo17和mo16的假阳性数量方面有优势。此外，我们已经证明了我们的跟踪器能够整合额外的身份保持扩展。

牛逼的追踪器应该长啥样（Oracle Tracktors）

We have shown that none of the dedicated tracking methods specifically targets challenging tracking scenarios, i.e., objects under heavy occlusions or small objects. We therefore want to motivate our Tracktor as a new tracking paradigm. To this end, we analyse our performance twofold: (i) the impact of the object detector on the killing policy and bounding box regression, (ii) identify performance upper bounds for potential extensions to our Tracktor. In Table 4, we present several oracle trackers by replacing parts of our algorithm with ground truth information. If not mentioned otherwise, all other tracking aspects are handled by our vanilla Tracktor. Their analysis should provide researchers with useful insights regarding the most promising research directions and extensions of our Tracktor.

我们已经证明，没有一种专用的跟踪方法来针对具有挑战性的跟踪场景，即严重遮挡下的目标或小目标。因此，我们想促使我们的跟踪器向新的跟踪范式转变。为此，我们从两方面对其性能进行分析：1）目标检测器对销毁策略和边界框回归的影响，2）我们的追踪器加上潜在的扩展功能后，性能可以提升到什么程度。在表4中，我们用标记过的信息替换了部分算法，给出了几个以牛逼（Oracle）命名的跟踪器。如果没有提到其他方面，所有其他跟踪方法都采用和我们的原生跟踪器（Tracktor）一样的方法。对这些追踪器的的分析应该为研究人员提供有用的见解，关于最有前途的研究方向和我们的跟踪器的扩展。

Detector oracles. To simulate a potentially perfect object detector, we introduce two oracles:

Oracle-Kill: Instead of killing with NMS or classification score we use ground truth information.
Oracle-REG: Instead of regression, we place the bounding boxes at their ground truth position.

牛逼检测器。为了模拟一个可能完美的目标检测器，我们引入了两个牛逼版本:

Oracle-Kill：我们使用标记过（GT）的信息，而不是NMS或分类得分；
Oracle-REG：我们将边界框置于（GT）位置，而不是回归。

Both oracles yield substantial improvements with respect to MOTA and FP. However, killing by ground truth instead of score deteriorates identity preservation as the regression struggles with otherwise unseen bounding boxes.

这两种牛逼检测器都在MOTA和FP方面有了实质性的改进。然而，采用GT来销毁跟踪的目标比用分类得分销毁跟踪目标会导致身份保存性能变差，因为回归与看不见的边界框斗争。

Extension oracles. It should be noted, that Tracktor++ with non-perfect extensions already compensates for some of the detector’s insufficiencies. The reID and motion model (MM) oracles simulate potential additional performance gains. In order to remain online, these exclude any form of hindsight tracking-gap interpolation.

Oracle-MM: A motion model places each bounding box at the center of the ground truth in the next frame.
Oracle-reID: Re-identification is performed with ground truth identities.

牛逼的扩展。需要注意的是，Tracktor++具有不完美的扩展，已经弥补了检测器的一些不足。reID和motion model (MM) oracle模拟了潜在的额外性能增益。为了保持在线，这些方法排除了任何形式的后见之明跟踪间隙插值。

Oracle-MM：一个运动模型将每个边界框放置在下一帧的ground truth的中心。
Oracle-reID：使用GT身份进行重新识别。

As expected, both oracles improve IDF1 and identity switches substantially. The combined Oracle-MM-reID represents the extension upper bound of Tracktor++.

正如预期的那样，两款牛逼版本都大大改进了IDF1和身份切换。Oracle-MM-reID组合代表Tracktor++的扩展上限。

Omniscient oracle. Oracle-ALL performs ground truth killing, regression and reID. We consider its top MOTA of 72.2%, in combination with a high IDF1 and virtually no false positives, as the absolute upper bound of Tracktor with a Faster R-CNN and FPN object detector.

牛到爆。Oracle-ALL执行ground truth killing、regression和reID。我们认为其最高MOTA为72.2%，结合高IDF1和几乎没有假阳性，作为具有更快的R-CNN和FPN目标检测器的Tracktor的绝对上限。

The substantial performance gains from Oracle-MM indicate the potential of extending Tracktor with a sophisticated motion model. In particular, Oracle-MM-reIDINTER suggests a predictive motion model which hallucinates the position of an object through long occlusions. Such a motion model avoids offline post processing and additional false positives from wrong linear occlusion paths caused by long detection gaps and camera movement.

从Oracle-MM获得的大量性能提升表明，使用复杂的运动模型扩展Tracktor的潜力。特别是，Oracle-MM-reIDINTER提出了一种预测运动模型，该模型通过长时间遮挡产生物体位置的幻觉。这样的运动模型避免了离线后处理和由长检测间隙和相机运动引起的错误线性遮挡路径的额外误报。

迈向一种新的跟踪范式

To conclude our analysis we propose two approaches on how to utilize Tracktor as a starting point for future research directions:

为了总结我们的分析，我们就如何利用Tracktor作为未来研究方向的起点提出了两种方法：

Tracktor with extensions. Apply Tracktor to a given set of detections and extend it with tracking specific methods. Scenarios with large and highly visible objects will be covered by the frame to frame bounding box regression. For the remaining, it seems most promising to implement a hallucinating motion model, taking into account the individual movements of objects. In addition, such a motion predictor reduces the necessity for an advanced killing policy.

Tracktor与扩展。将Tracktor应用到给定的检测集，并通过跟踪特定方法扩展它。具有大型和高度可见对象的场景将被帧到帧边界框回归覆盖。至于剩下的，似乎最有希望的是实现一个幻觉运动模型，考虑到物体的个体运动。此外，这样的运动预测器减少了高级杀伤策略的必要性。

Tracklet generation. Analogous to tracking-by-detection, we propose a tracking-by-tracklet approach. Indeed, many algorithms already use tracklets as input [24, 65], as they are richer in information for computing motion or appearance models. However, usually a specific tracking method is used to create these tracklets.We advocate the exploitation of the detector itself, not only to create sparse detections, but frame to frame tracklets. The remaining complex tracking cases ought to be tackled by a subsequent tracking method.

Tracklet生成。与检测跟踪类似，我们提出了一种跟踪跟踪小轨的方法。事实上，许多算法已经使用tracklet作为输入[24,65]，因为它们在计算运动或外观模型方面的信息更丰富。然而，通常使用特定的跟踪方法来创建这些tracklet。我们提倡利用检测器本身，不仅创建稀疏检测，而且创建帧对帧的tracklet。其余复杂的追踪案件应由随后的追踪方法处理。

In this work, we have formally defined those hard cases, analyzing the situations in which not only our method but other dedicated tracking solutions fail. And by doing so, we question the current focus of research in multi-object tracking, in particular, the missing confrontation with challenging tracking scenarios.

在这项工作中，我们正式定义了那些困难的案例，分析了我们的方法和其他专用跟踪解决方案失败的情况。通过这样做，我们对当前多目标跟踪研究的焦点提出了质疑，特别是，缺少对抗与具有挑战性的跟踪场景。

结论

We have shown that the bounding box regressor of a trained Faster-RCNN detector is enough to solve most tracking scenarios present in current benchmarks. A detector converted to Tracktor needs no specific training on tracking ground truth data and is able to work in an online fashion. In addition, we have shown that our Tracktor is extendable with re-identification and camera motion compensation, providing a substantial new state-of-the-art on the MOTChallenge. We analyzed the performance of multiple dedicated tracking methods on challenging tracking scenarios and none yielded substantially better performance compared to our regression based Tracktor. We hope this work establishes a new tracking paradigm, utilizing the object detector’s full capabilities.

我们已经证明，训练有素的快速rcnn检测器的边界框回归器足以解决当前基准中出现的大多数跟踪场景。探测器转换为跟踪器不需要专门训练就能跟踪地面真实数据，并且能够在线工作。此外，我们已经证明，我们的跟踪器是可扩展的重新识别和相机运动补偿，提供了一个实质性的最新技术的运动挑战。我们分析了多种专用跟踪方法在具有挑战性的跟踪场景下的性能，没有一种产生比我们基于回归的跟踪器更好的性能。我们希望这项工作建立一个新的跟踪范式，利用目标探测器的全部能力。

本文标签：笔记 Tracking bells whistles

版权声明：本文标题：《Tracking without bells and whistles》翻译和笔记内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://www.elefans.com/dongtai/1728889752a1177948.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

《Tracking without bells and whistles》翻译和笔记

摘要

相关工作

Detector足矣

目标检测器

跟踪器

跟踪扩展

实验

消融研究（Ablation study）

基准评估

分析

跟踪的挑战

牛逼的追踪器应该长啥样（Oracle Tracktors）

迈向一种新的跟踪范式

结论

更多相关文章

论文笔记：Interpretable Convolutional Neural Networks

笔记：Interpretable Convolutional Neural Networks

安装opensuse的笔记-重庆linux开源组织

笔记1：关于Unity3D内一种Position相同而位置不一致的情况

C++笔记之vector的reserve()和capacity()用法

已解决：There is no tracking information for the current branch. Please specify which branch you want

There is no tracking information for the current branch. Please specify which branch you want to...

git pull 时出现：There is no tracking information for the current branch. Please specify which branch...

使用Java语言进行2D游戏编程基础 Fundamental 2D Game Programming with Java 全书笔记（未完结）

Android Studio笔记之webview ——实现app本身打开网页而不跳转到其他浏览器（内置浏览器）

EMBER-网络安全恶意软件公开数据集，论文的翻译，自己的笔记

MAX30102 血氧调试笔记

黑马点评项目全部功能实现及详细笔记--Redis练手项目

《学成在线》微服务实战项目实操笔记系列（P1~P83）【上】

linuxcentos6笔记

Linux笔记：关于Linux操作系统的特性、界面、购买、安装、接入操作等

VC++编程之第三课笔记——MFC窗口创建过程以及窗口类的封装

LIFT: Learned Invariant Feature Transform详细笔记

论文笔记-------Topological sound

【学术英语笔记】How to Write and Publish a Literature Review文献综述的学术短语

发表评论

推荐文章

Magic Linux 1.2 最终官方版 发布！

PDF阅读器哪个好用？看完这篇文章就可以不用再问了

Matlab2019b.Warning: The CUDA driver must recompile the GPU librariesbecause your device is more

C++ priority_queue优先队列的用法

TestNG设置用例执行顺序之priority参数

热门文章

每日总结

windows7下硬盘安装linux

福昕pdf阅读器，怎么将pdf文件中的某一页，提取出来，即分页保存？

计算机组成与维护知识点

Windows11添加美式（纯英文）输入法

实时中文输入法中AI LLM的应用：更准确更流畅

PHP Manager 的使用

如何最简单地在mac pro双系统（mac+xp）下重装windows

100 Ways to Motivate Yourself--(G)

迁移学习全面概述

最新文章

win10企业版如何安装应用商店-默认没有应用商店

windows10 LTSC版本 安装应用商店及聚焦屏保

安卓应用下载市场

Chrome应用商店镜像方法 | Crx根据ID直接下载 | 浏览器插件推荐网站

苹果应用商店登陆服务器出现问题,苹果应用商店提示无法登录的问题

Chrome浏览器应用商店扩展插件无法安装的解决办法

rancher应用商店的使用

Flutter：使用url_launcher打开外部浏览器、拨打电话、发送短信、打开第三方app、打开应用商店下载应用

python 网络爬虫——爬取小米应用商店排名前100App

关于deepin应用商店一直显示正在安装&amp;&amp;安装时出现有未能满足的依赖关系的解决方法

如何找到 Microsoft Store 下载的python文件位置

kubesphere发布应用到应用商店完整步骤

play商店 小米_小米应用商店和Google Play商店的简单对比

h5-跳转手机应用商店，下载APP

如何从应用商店Microsoft Store免费下载安装HEVC视频扩展插件

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

Magic Linux 1.2 最终官方版发布！

windows10 LTSC版本安装应用商店及聚焦屏保

关于deepin应用商店一直显示正在安装&&安装时出现有未能满足的依赖关系的解决方法

play商店小米_小米应用商店和Google Play商店的简单对比

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载