[caption学习]：综述：A comprehensive survey of deep learning for image caption|电子爱好者

admin管理员组
文章数量:1564684

A Comprehensive survey of Deep Learning for image Caption

INTRODUCTION

作者通过归纳不同的方面，将目前主流的caption方法归类为以下几种：（在section 2里面详细说明）
- 基于模板的caption（Template -based ）
- 基于检索的caption（Retrieval-based）
- Novel imag caption generation（大多数基于深度学习的归类于此）
其次作者又再次将基于深度学习的方法归类为以下几种：（在section 3中详细说明）
- (1) Visual space-based,
- (2) Multimoda space-based,
- (3) Supervised learning,
- (4) Other deep learning,
- (5) Dense captioning,
- (6) Whole scene-based,
- (7) Encoder-Decoder Architecture-based,
- (8) Compositional Architecture-based,
- (9) LSTM(Long Short-Term Memory) [54] language model-based,
- (10) Others language model-based,
- (11)Attention-Based,
- (12) Semantic concept-based,
- (13) Stylized captions
作者对目前主流的数据集进行了整理和总结。（参见section4）
作者对目前的主要方法的result进行了对比和分析。（参见section5）
作者在section 6 部分进行了简短的讨论并对未来研究进行了展望。
在section 7 部分进行了总结。

IMAGE CAPTIONING METHODS

这部分作者简要回顾和描述目前存在的catption方法，具体包括template -based image caption、retrieval-based image caption and novel caption generation。

Template-based image caption:
- 模板类方法指的是采用一系列模板插槽（black slot）的方式对图像进行描述，主要思路是先检测对象，然后填充描述语句模板；
- 很明显这种方法无法生成可变长度的描述，部分方法[2，32，76，77，101]在图像字幕中引入了基于解析的语言模型，它们比基于固定模板的方法更强大。但是这种方法并非本文关注的重点（听起来就不很靠谱）。
Retrieval-based image caption:
- 基于检索的方法指的是在库里有一批生成好的图像和其描述。当对新的图像进行描述时，通过图像相似性计算在库中检索出类似的images，然后将这些图像的描述作为候选描述，再用一定的方法从这些候选池中选择恰当的描述。这些方法一般能产生通用和语法正确描述，但是对于特异性较强的图片则很难生成恰当的描述。（并且严重以来检索库）
Novel caption generation：
- Novel caption generation一般通过两个层面共同生成图像描述，一个是在视觉空间分析图像内容，另外一个是在基于语言模型的多模态空间基于图像内容生成图像描述。绝大多数这类方法均基于深度学习技术。这部分也是本文的重点。
- An overall taxonomy of deep learning-based image captioning.：
- 上图列举了不同类别的方法，其中：
  - 监督学习和其他深度学习（强化学习和无监督学习）
  - 整个场景描述和不同区域描述（密集字幕）
  - 编解码体系结构和组成体系结构
  - 基于视觉空间和基于多模态空间的
  - 基于LSTM（包含RNN等语言模型）和其他
  - 其他：
    - 基于注意力机制
    - 基于语义概念
    - Novel Object based
    - 风格化描述

DEEP LEARNING BASED IMAGE CAPTIONING METHODS

在上图1中，对caption 的方法进行的简要的对比和归类，接下来会对每一类别进行展开说明。此外做了一张表对目前比较主流的方法进行了简要的概括，包括方法名称、图像编码方式、语言模型以及在本文中的类别。具体参见下表：

3.1 Visual Space vs. Multimodal Space

TODO

本文标签： Survey Comprehensive Deep Image Learning

版权声明：本文标题：[caption学习]：综述：A comprehensive survey of deep learning for image caption 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://www.elefans.com/dianzi/1725897129a1047782.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

xp系统

How to Bridge the Gap between Modalities: A Comprehensive Survey on Multi-modal Large Language Model

18天前

本文是LLM系列文章，针对《How to Bridge the Gap between Modalities: A Comprehensive Survey on Multi-modal Large Language Model》的翻译。

Comprehensive tempdb blog post series

18天前

Comprehensive tempdb blog post series You are here: Home >> Database Maintenance >> Comprehensive tempdb blo

Comprehensive Statistical Analysis of Geographical Conditions in Tianjin, Xiangyang & Guangzhou

18天前

1. Introduction These are a series of planning and decision-making oriented comprehensive statistical analysis projects

Gate Level Simulation Comprehensive Overview

18天前

Gate Level Simulation Comprehensive Overview

HTSQL is a comprehensive navigational query language for relational databases.

18天前

HTSQL is a comprehensive navigational query language for relational databases. A Database Query Language — HTSQL HTSQL i

Comprehensive cover

18天前

Comprehensive coverage requires data security assets that address every conceivable vulnerability from networks to datab

【graph embedding笔记】A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications

18天前

A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications阅读论文笔记、论文概要主要贡献： 0.1 基于

comprehensive event management Android

18天前

Assignment 3 (40%) Android Skills you will learn In this assignment, you will develop a more comprehensive event manag

image caption （三）强化学习之Self-critical

14天前

《Self-critical Sequence Training（SCST） for Image Captioning》 RL：训练模型，输入st

Self-critical SequenceTraining for Image Captioning

14天前

Self-critical Sequence Training for Image Captioning 原文地址时间：2017 Intro 近年来策略梯度（policy-gradien

跟TED演讲学英文：Bring on the learning revolution! by Sir Ken Robinson

11天前

Bring on the learning revolution! Link: https:www.tedtalkssir_ken_robinson_bring_on_the_learning_revolution Speaker

DAY17--learning English

11天前

一、积累 1.grave graveyard shift 夜班。 2.allure sometimes the darkest corner of our mind have their own demonic allure。有时候我

COMP 309 — Machine Learning Tools and Techniques Assignment 3: Kaggle Competition

9天前

IF you want the assignments solution, please add my wechat: fuji12345 1 ObjectivesThe goal of this assignment is to he

The requested image‘s platform (linuxarm64v8) does not match the detected host platform (linuxamd

8天前

这一段完整的报错是： The requested image’s platform (linuxarm64v8) does not match the detected host platform (linuxa

报错解决:native-image building on Windows currently only supports target architecture: AMD64

8天前

使用mvn命令打native-image时会编译报错：Native-image building on Windows currently only supports target architecture: AMD6

Improvements in Deep Q Learning: Dueling Double DQN, Prioritized Experience Replay, and fixed…

8天前

转至：https:www.freecodecampnewsimprovements-in-deep-q-learning-dueling-double-dqn-prioritized-experience-re

《Quantization for Sustainable Reinforcement Learning》

6天前

强化学习作为一种无监督学习，没有标签使得学习收敛的速度非常低，所以强化学习的加速训练得到了广泛的关注。什么是强化学习？简单来说，就是我们对相应的状态设

Perhaps it was the deep customization of domestic

1天前

Perhaps it was the deep customization of domestic manufacturers that inspired Google. On August 19, Google gave a big wa

【图像超分】论文精读：CoSeR: Bridging Image and Language for Cognitive Super-Resolution（CoSeR）

16小时前

第一次来请先看这篇文章：【超分辨率（Super-Resolution）】关于【超分辨率重建】专栏的相关说明，包含专栏简介、专栏亮点、适配人群、相关说明、阅读顺序、超分理解、实现流程、研究方向、论文代码数据集汇总等）文章目录前言Abstr

Spectrum Sensing Based on Deep Learning Classification for Cognitive Radios阅读 2019

16小时前

以前的传统感知方法->机器学习合作频谱感知->单节点频谱感知，提前提取特征然后神经网络分类。我们思考通常情况下的频谱感知问题，而非具体信号的监测，将频谱感知看作一

电子爱好者 - 最新技术资讯及电子产品介绍！

[caption学习]：综述：A comprehensive survey of deep learning for image caption

A Comprehensive survey of Deep Learning for image Caption

INTRODUCTION

IMAGE CAPTIONING METHODS

DEEP LEARNING BASED IMAGE CAPTIONING METHODS

3.1 Visual Space vs. Multimodal Space

更多相关文章

How to Bridge the Gap between Modalities: A Comprehensive Survey on Multi-modal Large Language Model

Comprehensive tempdb blog post series

Comprehensive Statistical Analysis of Geographical Conditions in Tianjin, Xiangyang &amp; Guangzhou

Gate Level Simulation Comprehensive Overview

HTSQL is a comprehensive navigational query language for relational databases.

Comprehensive cover

【graph embedding笔记】A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications

comprehensive event management Android

image caption （三）强化学习之Self-critical

Self-critical SequenceTraining for Image Captioning

跟TED演讲学英文：Bring on the learning revolution! by Sir Ken Robinson

DAY17--learning English

COMP 309 — Machine Learning Tools and Techniques Assignment 3: Kaggle Competition

The requested image‘s platform (linuxarm64v8) does not match the detected host platform (linuxamd

报错解决:native-image building on Windows currently only supports target architecture: AMD64

Improvements in Deep Q Learning: Dueling Double DQN, Prioritized Experience Replay, and fixed…

《Quantization for Sustainable Reinforcement Learning》

Perhaps it was the deep customization of domestic

【图像超分】论文精读：CoSeR: Bridging Image and Language for Cognitive Super-Resolution（CoSeR）

Spectrum Sensing Based on Deep Learning Classification for Cognitive Radios阅读 2019

发表评论

推荐文章

移动端手机浏览器页面出现上下滑动页面是涩的感觉，不流畅

键盘闪现出来又消失的解决办法

Linux第九章 文件系统管理

关于电脑端如何下载云班课资源---将云班课资源下载到本地

PYNQ-Z2 学习总结 （一）无法正常启动板子 无法访问板载文件

热门文章

360助手上app下载地址

mac打开ppt陷入报错循环

电脑提示丢失d3dx11_43.dll如何修复-一键自动修复

HTML5页面如何在手机端浏览器调用相机、相册功能

2种方法，当文本框输入@自动补全邮箱后缀（特别是命名空间的引用，共三种方法）

dell 如何给raid分区_用U盘启动盘给Dell服务器装系统找不到RAID阵列解决办法

电脑莹石云显示认证失败_添加萤石云方法与常见问题解答

破解App推广困局：Xinstall如何助力游戏盒子代理快速扩大用户池？

解决Mac使用Win10局域网共享打印机没反应问题

Android获取手机当前连接的WiFi信息（SSID，IP，连接状态）

最新文章

关于win10输入法导致电脑直接卡机无法动弹问题

推荐文章：深度定制你的输入风格 - 百度手机输入法皮肤工具 BiSkinTool V1.0.6

linux 怎么关闭输入法快捷键设置方法,关闭输入法快捷键

win10自带输入法导入其他词库

kali安装输入法

安卓输入法 车机版_触宝输入法HD

两个port贴合七夕主题，百度输入法的“情感营销”策略

树莓派中文输入法安装

解决Mac版百度五笔删除&quot;百度拼音&quot;电脑重启后输入法菜单中还存在的原因及解决办法

UBUNTU输入法安装后乱码

输入法原理

百度拼音输入法2014官方版

windows10 出现重命名文件名字,百度无法弹出输入法的解决方法

Manjaro安装输入法

android 外接键盘 五笔 百度输入法

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

在哪些场景下应优先考虑使用treenode

Comprehensive Statistical Analysis of Geographical Conditions in Tianjin, Xiangyang & Guangzhou

Linux第九章文件系统管理

PYNQ-Z2 学习总结（一）无法正常启动板子无法访问板载文件

安卓输入法车机版_触宝输入法HD

解决Mac版百度五笔删除"百度拼音"电脑重启后输入法菜单中还存在的原因及解决办法

android 外接键盘五笔百度输入法

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载