[Paper Reading] AUM Identify Mislabeled Data using the Area Under the Margin Ranking|电子爱好者

admin管理员组
文章数量:1654387

Identify Mislabeled Data using the Area Under the Margin Ranking

Paper Reading

Identify Mislabeled Data using the Area Under the Margin Ranking
- Background
- Contribution
- Methodology
- Discussion

Background

目前关于noise-label 学习的工作一般包括两个大类
- loss，一般就是通过改进loss，使得不同样本具有不同的权重，从而改善模型的效果，避免过拟合到noise label
- re-label，一般就是通过某种方法找到可能是噪声的数据，从而给他们re-label
本文从大类上看属于第二种范式，re-label。不同之处在于本文只关注找到mislabeled data，不会去纠正他们的标签。
作者认为通过找到mislabeled data，然后删除他们可以构建一个较为纯净的数据集

Contribution

作者提出了一种度量方式用于区分噪声数据和非噪声数据，称之为AUM（Area Under the Margin Ranking）。该方法可以针对每个sample计算一个AUM值。
上述AUM值可以通过阈值来划分，但是阈值需要手动调整。因此作者提出了一种自动确定阈值的方法。

Methodology

Margin的定义如下所示，其中t代表是第t个epoch，x代表是输入的数据，y代表annotation labe，z代表的是最终prediction的logits。由式子定义可知其可能会去到负数，当为负数的时候，代表模型预测的结果可能和真值结果存在不同，因此当前样本可能是噪声。
M t ( x , y ) = z y t ( x ) − m a x i ! = y z i t ( x ) M^{t}(x,y) = z^{t}_{y}(x) - max_{i != y}z^{t}_{i}(x) Mt(x,y)=zyt(x)−maxi!=yzit(x)
考虑到不同epoch margin值可能是不一样的，因此作者定义了如下所示的AUM值，它相当于对前T个epoch的Margin值计算了平均。
A U M ( x , y ) = 1 T ∑ t = 1 T M t ( x , y ) AUM(x, y) = \frac{1}{T}\sum_{t=1}^T{M^t(x,y)} AUM(x,y)=T1t=1∑TMt(x,y)
AUM值越小代表这个样本越有可能是噪声数据，但是只根据ranking是没有办法得到一个绝对的划分。因此需要一个绝对的划分。
作者提出使用threshold samples，作者从训练集合中抽样一部分数据出来作为threshold samples，这部分数据会人为的指定噪声标签，并且加入训练。最终这部分数据的AUM前从高到底排序的90分位值即可以作为AUM的阈值，用于划分噪声数据和非噪声数据。

Discussion

关于截止时间。因为训练到后面均会在训练集上拟合的较好，因此如何选择AUM计算的终止时间至关重要。作者提出在第一次进行学习率调整的时候即可以终止。
关于噪声数据的噪声分布。该文章大部分的假设是基于噪声数据是平均分布的，即就是等概率的分为其他类别。作者也讨论了非对称的噪声分布。相比于等概率的平均分布，非对称噪声数据对噪声的容错能力较低。作者实验证明，非对称数据中，40%的数据是噪声数据，其偏向于某一类。在该组实验中，非对称组对噪声识别的recall就会大幅降低（即不能找到噪声数据）。原因在于如果是非对称分布，就会使得正确样板的AUM值减少，mislabeled的AUM值增大。如第一个公式所示。正确样本的前一项减少（因为原来最大可能是80%，现在就变成了60%）。mislabeled样本的margin会增大。

本文标签： AUM identify paper reading Mislabeled

版权声明：本文标题：[Paper Reading] AUM Identify Mislabeled Data using the Area Under the Margin Ranking 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://www.elefans.com/dianzi/1729649418a1208860.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

xp系统

Identify Linux File System Type

5小时前

5 Methods to Identify Your Linux File System Type (Ext2 or Ext3 or Ext4) by RAMESH NATARAJAN on APRIL 18, 2011 Tweet

As抽风：Could not identify launch activity: Default Activity not found Error while Launching activity

5小时前

本人也是用As差不多不到时间，现在用的是2.0正式版本！今天我就是抽取了一个抽象方法，抽风的as 就报 Could not identify launch activity: Default Activity not found Err

RAC环境中ORA-01157: cannot identifylock data file 9 - see DBWR trace file

5小时前

在RAC环境中，在某一节点创建本地文件系统的表空间，随后在另外的一个节点的alter日志里发现如下错误Linux-x86_64 Error: 2: No such file or director

【Oracle】ORA-01157: cannot identifylock data file 201 - see DBWR trace file

5小时前

今天数据库在查询数据的时候显示了这个错误： ORA-01157: cannot identifylock data file 201 - see DBWR trace fileORA-01110: data fi

解决80端口占用Identify and stop the process that‘s listening on port 80 or configure this application

5小时前

问题描述： Description:Web server failed to start. Port 80 was already in use.Action:Identify and stop the proces

实现IDentify方法汇总

5小时前

（一）自制iDentify 工具思路：首先获取鼠标点击处的地图坐标，以此坐标为圆心建立一个半径较小的缓冲区，进而分析目标图层中

Fatal error: Failed to identify device. Check connections andor reset hardware

5小时前

解决方法：按下仿真器的Reset键>ok ,完美解决！ 【前提条件是接线正确】

PIL.UnidentifiedImageError: cannot identify image file与 load image file is truncated问题

5小时前

训练模型画混淆矩阵时，出现 load image file is truncated（490 bytes…）问题，应该是图像损坏了&#xff0c

Port XXXX is in use by another program. Either identify and stop that program, or start the server w

5小时前

I want to run a single python flask hello world. I deploy to App Engine, but it’s showing like it’s saying that the port

Arcgis Server 9.3.1 Identify 结果 Bug

5小时前

最近在开发Flex程序时发现了一个I查询的Bug,具体情况是这样的: 1.在VISIBLE模式下,无论如何只能在默认可见的图层中查询,layerIds只对默认可见的图层起作用. 2.在ALL模式下,一切正常. 开始以为是Flex API的

ORA-01157: cannot identifylock data file 6 - see DBWR trace file

5小时前

转载于：ORA-01157: cannot identifylock data file 6 - see DBWR trace file ORA-01110: 解决方法 (gxlcms) ORA-01157: ca

ORA-01157 cannot identifylock data file 10

5小时前

数据库报错： ORA-01157: cannot identifylock data file 10 - see DBWR trace file ORA-01110: data file 10: u01appo

ORA-00258: manual archiving in NOARCHIVELOG mode must identify log

5小时前

10g RAC 数据库未归档。解释: alter system archive log current *ERROR at line 1:ORA-00258: manual archiving in NOARCHIVELOG mode

cannot identifylock data file %s - see DBWR trace file

5小时前

一．错误描述 ORA-1157, "cannot identifylock data file %s - see DBWR trace file" 引起的原因： 因为数据文件

11.Identify the memory component from which memory may be allocated for:

5小时前

11.Identify the memory component from which memory may be allocated for: 1.Session memory for the shared server 2.Buffer

vue整合identify(生成图片验证码)插件

4小时前

identify简介这是一个vue的插件，使用canvas来生成图形验证码具体参数如下： 在srccomponentsidentify目录下创建identify.vue文件

解决端口占用Identify and stop the process that‘s listening on port 11026

4小时前

1. 问题描述程序非正常关闭倒置端口被占用 ***************************APPLICATION FAILED TO START***************************Description:W

关于IDEA运行项目报Identify and stop the process that‘s listening on port 8080 解决8080端口占用的方法

4小时前

教程开始打开cmd 输入 netstat -aon|findstr "8080" 在任务管理器中找到它并关闭结束任务就可以了

Use UMDH to identify memory leak problem

4小时前

Use UMDH to identify memory leak problem 原文链接：https:www.thinksaasgrouptopic634356 We sometimes got me

How to identify safari in Mac?

4小时前

How to identify safari in Mac?in userAgent, find keywords below1) and: Macintosh, Mac OS X, AppleWebKit,2) or: Chrome or

电子爱好者 - 最新技术资讯及电子产品介绍！

[Paper Reading] AUM Identify Mislabeled Data using the Area Under the Margin Ranking

Identify Mislabeled Data using the Area Under the Margin Ranking

Paper Reading

Background

Contribution

Methodology

Discussion

更多相关文章

Identify Linux File System Type

As抽风：Could not identify launch activity: Default Activity not found Error while Launching activity

RAC环境中ORA-01157: cannot identifylock data file 9 - see DBWR trace file

【Oracle】ORA-01157: cannot identifylock data file 201 - see DBWR trace file

解决80端口占用Identify and stop the process that‘s listening on port 80 or configure this application

实现IDentify方法汇总

Fatal error: Failed to identify device. Check connections andor reset hardware

PIL.UnidentifiedImageError: cannot identify image file与 load image file is truncated问题

Port XXXX is in use by another program. Either identify and stop that program, or start the server w

Arcgis Server 9.3.1 Identify 结果 Bug

ORA-01157: cannot identifylock data file 6 - see DBWR trace file

ORA-01157 cannot identifylock data file 10

ORA-00258: manual archiving in NOARCHIVELOG mode must identify log

cannot identifylock data file %s - see DBWR trace file

11.Identify the memory component from which memory may be allocated for:

vue整合identify(生成图片验证码)插件

解决端口占用Identify and stop the process that‘s listening on port 11026

关于IDEA运行项目报Identify and stop the process that‘s listening on port 8080 解决8080端口占用的方法

Use UMDH to identify memory leak problem

How to identify safari in Mac?

发表评论

推荐文章

成功解决ValueError: feature_names mismatch: [‘f0‘, ‘f1‘, ‘f2‘, ‘f3‘, ‘f4‘] expected f3, f1, f2, f0, f4

电脑桌面图标突然不见了

DiskGenius DOS版使用方法图解

mac垃圾桶清空了如何找回呢?

【中项第三版】系统集成项目管理工程师 | 第 7 章 软硬件系统集成

热门文章

计算机xp桌面没有下面的图标不见了,电脑图标不见了,下面的任务栏也不见了,怎么处理?XP系统...

安装了防火墙之后还有必要安装杀毒软件吗

Mysql的介绍和软件环境的部署

华为鲲鹏题库（一）

Java开发环境搭建超全详解(1)

Linux常见的指令集

悬浮在html上的页面,在网页上实现如迅雷看看出现的悬浮页面操作，底层网页操作无效。...

解决视频播放器在线视频显示绿屏

达人评测i7 1360p和i7 1260p差距 酷睿i71360p和i71260p选哪个

评测酷睿i7 14650HX和Ultra5 125h 选哪个 i714650HX和Ultra5125h对比

最新文章

树莓派4B镜像文件备份（使用Win32DiskImager）

关于Win32DIskImager烧写树莓派镜像文件出现Error5：拒绝访问的解决办法

VirtualBox 中安装 Win10

【Docker】win10上修改docker的镜像文件存储位置（九）

win10虚拟机VMware安装homeassistant镜像

Win10系统Oracle VM VirtualBox使用ISO镜像 安装MacOS虚拟机

一键解决Win10 LTSC 2021官方镜像存在的问题

将文件从主系统win10传送到VMware虚拟机上的win11

使用win10自带的虚拟光驱挂载ISO镜像文件，并安装QTP

使用win10自带虚拟光驱打开ISO镜像文件

Win10 配置 Python 默认镜像源

win10安装anaconda并更换清华镜像源+安装pytorch

Win10系统下Docker的安装使用，镜像的拉取及导出（一）

Win10镜像安装net3.5

Win10 镜像安装到新固态硬盘两法

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

【中项第三版】系统集成项目管理工程师 | 第 7 章软硬件系统集成

达人评测i7 1360p和i7 1260p差距酷睿i71360p和i71260p选哪个

Win10系统Oracle VM VirtualBox使用ISO镜像安装MacOS虚拟机

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载