论文阅读-Modular Interactive Video Object Segmentation Interaction-to-Mask, Propagation|电子爱好者

admin管理员组
文章数量:1589800

Abstract

我们提出了模块化交互式 VOS (MiVOS) 框架，该框架将interaction-to-mask和mask propagation解耦，从而实现更高的通用性和更好的性能。单独训练的交互模块将用户交互转换为对象掩码，然后由我们的传播模块在读取space-time memory时使用新的 top-k 过滤策略进行时间传播。为了有效地考虑用户的意图，提出了一种新颖的difference-aware module来学习如何在每次交互之前和之后正确融合掩码，这些掩码通过使用space-time memory与目标帧对齐。我们在 DAVIS 上使用不同形式的用户交互（例如，涂鸦、点击）对我们的方法进行了定性和定量评估，以表明我们的方法优于当前最先进的算法，同时需要更少的帧交互，在泛化方面具有额外优势针对不同类型的用户交互。我们贡献了一个具有 480 万帧像素精确分割的大规模合成 VOS 数据集，以配合我们的源代码，以促进未来的研究。

Introduction

interactive VOS(iVOS)

特点： interactive VOS方法将用户交互（例如，涂鸦或点击）作为输入，用户可以在其中迭代地细化结果直到满意。

包含的两个任务：

interaction understanding
temporal propagation

Existing Problem

（1）The strong coupling limits the form of user interaction (e.g., scribbles only) and makes training difficult.Attempts to decouple the two tasks fail to reach state-of-the-art accuracy as user’s intent cannot be adequately taken into account in the propagation process.

强耦合限制了用户交互的形式（例如，仅涂鸦）并使训练变得困难。由于在传播过程中无法充分考虑用户的意图，尝试将这两个任务解耦未能达到最先进的准确性 .

（2）naive decoupling may lead to loss of user’s intent as the original interaction is no longer available in the propagation stage.

naive解耦可能会导致失去用户的意图，因为原始交互在传播阶段不再可用。

Solution

We present a decoupled modular framework to address the iVOS problem.

Contributions

We innovate on the decoupled interaction-propagation framework and show that this approach is simple, effective, and generalizable.我们对解耦的交互传播框架进行了创新，并表明这种方法简单、有效且可推广。
We propose a novel lightweight top-k filtering scheme for the attention-based memory read operation in mask generation during propagation.我们提出了一种新颖的轻量级 top-k 过滤方案，用于在传播过程中的掩码生成中基于注意力的内存读取操作。
We propose a novel difference-aware fusion module to faithfully capture the user’s intent which improves iVOS accuracy and reduces the amount of user interaction.我们提出了一种新颖的差异感知融合模块来忠实地捕捉用户的意图，从而提高 iVOS 的准确性并减少用户交互量。
We contribute a large-scale synthetic VOS dataset with 4.8M frames to accompany our source codes to facilitate future research.我们提供了一个具有 480 万帧的大规模合成 VOS 数据集，以配合我们的源代码，以促进未来的研究。

Related Work

Progress in iVOS is shown below:

Semi-Supervised Video Object Segmentation

defination: segment a specific object throughout a video given only a fully-annotated mask in the first frame.

Interactive Video Object Segmentation (iVOS)

focus:

（1）scribble interaction

（2）click interaction

Interactive Image Segmentation

Method

Initial Work

Initially, the user selects and interactively annotates one frame (e.g., using scribbles or clicks) to produce a mask.

最初，用户选择并交互式地注释一帧（例如，使用涂鸦或点击）以生成蒙版。

MiNet Overview

Character Denfination

（1）We denote r as the current interaction round

（2）the user-interacted frame index in the r-th round is tr

（3）the mask results of the r-th round is Mr

（4）the mask of individual j-th frame is denoted as M rj

Core Component

interaction-to-mask:allowing the user to obtain real-time feedback and achieve a satisfactory result on a single frame

mask propagation: the corrected mask is bidirectionally propagated

difference-aware fusion: use the two sequences while avoiding possible decay or loss of user’s intent.

how to capture the user’s intent:use the difference in the selected mask before and after user interaction

Figure

Interaction-to-Mask

Scribble-to-Mask(S2M)

Goal: produce a single-image segmentation in real time given input scribbles

backbone: DeepLabV3+ semantic segmentation network

Local Control

previous state-of-the-art approach:it may harm the global result when only local fine adjustment is needed toward the end of the segmentation process.

the source of previous state-of-the-art approach:

Konstantin Sofiiuk, Ilia Petrov, Olga Barinova, and Anton Konushin. f-brs: Rethinking backpropagating refinement for interactive segmentation. In CVPR, 2020. 1, 2, 3, 4, 7, 8

our approach:it is straightforward to assert local control by limiting the interactive algorithm to apply in a user-specified region

the comparison of above two approaches:

Temporal Propagation

Goal: tracks the object and produces corresponding masks in subsequent frames.

Memory Read with Top-k Filtering

（1）计算affinity

F ∈ R THW ×HW represents the affinity between a query position and a memory position

（2）filter the affinities such that only the top-k entries are kept

作用：effectively removes noises regardless of the sequence length

优点：increase robustness and overcome the overhead of top-k

（3）For query position j, the feature mj is read from memory by：

（4）concatenate the read features with vQ

the process is shown below:

Propagation strategy

our propagation scheme:

Difference-Aware Fusion

（1）compute the positive and negative changes separately as two masks D+ and D−

说明：(·)+ is the max(·, 0)

（2）compute the aligned masks

说明：W来自Memory Read with Top-k Filtering中的第二步

（3）feed these features into a simple five-layer residual network which is terminated by a sigmoid to output a final fused mask

Mechanism of the difference-aware fusion module:

说明：

Experiment

Performance on the DAVIS interactive validation set:

Conclusion

我们提出 MiVOS，一种由三个模块组成的新型解耦方法:Interaction-to-Mask, Propagation and Difference-Aware Fusion.通过将交互与传播解耦，MiVOS 是通用的，并且不受交互类型的限制。另一方面，所提出的fusion module通过忠实地捕捉用户的意图来协调交互和传播，并减少在解耦过程中丢失的信息，从而使 MiVOS 既准确又高效。我们希望我们的 MiVOS 能够激发和激发 iVOS 的未来研究

本文标签：论文 Interactive Video object Modular

版权声明：本文标题：论文阅读-Modular Interactive Video Object Segmentation Interaction-to-Mask, Propagation 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://www.elefans.com/dongtai/1728076009a1144481.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

xp系统

电子爱好者 - 最新技术资讯及电子产品介绍！

论文阅读-Modular Interactive Video Object Segmentation Interaction-to-Mask, Propagation

Abstract

Introduction

interactive VOS(iVOS)

Existing Problem

Solution

Contributions

Related Work

Semi-Supervised Video Object Segmentation

Interactive Video Object Segmentation (iVOS)

Interactive Image Segmentation

Method

Initial Work

MiNet Overview

Character Denfination

Core Component

Figure

Interaction-to-Mask

Scribble-to-Mask(S2M)

Local Control

Temporal Propagation

Memory Read with Top-k Filtering

Propagation strategy

Difference-Aware Fusion

Experiment

Conclusion

更多相关文章

SCI论文发表很容易【3】：论文修改稿-如何反驳审稿人

Rethinking the Route Towards Weakly Supervised Object Localization论文阅读

论文同义句在线转换器软件

Latex——论文翻译

【源码+论文】springboot视频网站系统的设计与实现

【虚拟人综述论文】Human-Computer Interaction System: A Survey of Talking-Head Generation

论文阅读：Predicting Dynamic Embedding Trajectory inTemporal Interaction Networks（JODIE模型）

【论文学习】GraphFM: Graph Factorization Machines for Feature Interaction Modeling

Channel Interaction Networks for Fine-Grained Image Categorization论文解读

HOTR: End-to-End Human-Object Interaction Detection with Transformers

AN INTERACTION-AWARE ATTENTION NETWORK FOR SPEECH EMOTION RECOGNITION IN SPOKEN DIALOGS -情感识别论文学习

论文笔记：Protein-protein interaction site predictionthrough combining local and global features

论文笔记 ACL 2021|Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a

论文阅读：Compositional Learning for Human Object Interaction

IFM论文笔记：Interaction-aware Factorization Machines for Recommender Systems

论文翻译：GraphTCN: Spatio-Temporal Interaction Modeling for Human Trajectory Prediction（行人轨迹预测2020）

RSIS 系列 Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation 论文阅读

【论文阅读】AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks(CIKM,19)

【论文笔记】《Efficient Physics-Based Implementation for Realistic Hand-Object Interaction...》

半监督交互式视频物体分割 Fast User-Guided Video Object Segmentation by Interaction-and-propagation Networks

发表评论

推荐文章

基于NetCoreC#的在线代码生成器

移动端app开发-02-iPhoneiPadAndroid UI尺寸规范

VS 2019社区版Microsoft账号无法登录及离线激活

清空回收站后的文件还能恢复吗？答案在这里

10. C语言之从浅入深一步一步全方位理解指针【附笔试题】

热门文章

最新HTML微信聊天对话生成器网页源码+实测可用

license在线生成系统，无需执行复杂命令，仅需一步即可快速使用

《uni-app》表单组件-Checkbox组件

Linux 常用命令脚本源码查看方法总结

如何卸载office201032位_win7卸载office2010的步骤_win7如何完美卸载office2010-win7之家...

回收站删除的文件怎么恢复？3个方法恢复文件！

windows系统开启远程端口，并修改端口号

主流分布式架构的风流韵事...

wsl子系统ubuntu20.04 设置docker服务开机自启动

我们选择登月（肯尼迪总统在赖斯大学的演讲）

最新文章

深度学习：RuntimeError: No CUDA GPUs are available

解决ssh使用public key远程登录服务器拒绝问题

webpack打包 css顺序异常问题解决

云原生--ceph故障排错

我们选择登月（肯尼迪总统在赖斯大学的演讲）

gre阅读passage131-174

在c语言程序中main函数的位置,在C语言程序中,main函数的位置_________。

CAP理解

「SAP技术」SAP MM 明明有维护源清单，还是不能下PO？

考研英语 长难句训练day32

搞定Direct local .aar file dependencies are not supported when building an AAR.把AAR以module方式加入

词汇的逻辑＜一＞:政法(1)

Google China confident despite loss of Lee

考研英语长难句训练day32

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载