首页 > 编程入门文章详情

多模态论文阅读之BLIP

编程入门行业动态更新时间:2024-10-25 02:19:00

<a href=https://www.elefans.com/category/jswz/34/1769362.html style= 多模态论文阅读之BLIP"/>

多模态论文阅读之BLIP

BLIP泛读

Title
Motivation
Contribution
Model

Title

BLIP: Bootstrapping Language-Image Pre-training for Uniﬁed Vision-Language Understanding and Generation

Motivation

模型角度：clip albef等要么采用encoder-base model 要么采用encoder-decoder model. However, encoder-based models are less straightforward to directly transfer to text generation tasks(e.g. image captioning), whereas encoder-decoder models have not been sucessfully adopted for image-text retrieval tasks. 那有没有一个统一的框架呢？
数据角度：SOTA的方法（如CLIP、ALBEF等）都在从web上收集到的图文对上进行预训练。尽管通过扩展数据集获得了性能提升，但本文的研究表明，对于视觉语言学习来说，有噪声的网络文本是次优（suboptimal）的。

Contribution

Bootstrapping: 从网页上获得了嘈杂的数据集训练一个模型，再通过一些方法获得一个更干净的数据集，能不能训练处一个更好的模型。
Unified:caption filter

Model

更多推荐

多模态论文阅读之BLIP

本文发布于:2023-11-17 01:28:22，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1636388.html

版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

多模论文 BLIP

上一篇：如何设置第三方cookie
下一篇：使用第三方DLL部署WPF应用程序

发布评论取消回复

评论列表（有 0 条评论）

热门文章