论文全名:Automatic brain MRI motion artifact detection based on end-to-end deep learning is similarly effective as traditional machine learning trained on image quality metrics

论文原文:Automatic brain MRI motion artifact detection based on end-to-end deep learning is similarly effective as traditional machine learning trained on image quality metrics - ScienceDirect

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用!


1. 省流版

1.1. 论文构架

1.2. 文章创新

2. 原文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Materials and methods

2.3.1. Data

2.3.2. Labeling

2.3.3. Data splitting

2.3.4. Classification based on image quality metrics

2.3.5. Classification based on minimally pre-processed images using deep learning

2.3.6. Evaluation of classification performance

2.4. Results

2.4.1. Classification based on image quality metrics

2.4.2. Classification based on minimally pre-processed images using deep learning

2.4.3. Comparison of the two classification approaches

2.5. Discussion

2.5.1. Advantages

2.5.2. Limits

2.6. Conclusions

3. 概念补充

3.1. Spatially separable convolution

3.2. Depthwise separable convolution

3.3. Confusion matrices

4. Reference List

2.1. Abstract

         This paper aims to classify usable and unusable T1 images under motion artifacts

confound v.击败,战胜(敌人);使困惑惊讶;使惊疑;证明…有错  

resonance n.谐振;共振;共鸣;洪亮;响亮;激发联想的力量;引起共鸣的力量;引起的联想(或共鸣)

head motion artifacts 头部运动伪影

data-hungry 数据匮乏

annotate vt. 注释    efficacy n.功效;(尤指药物或治疗方法的)效力

2.2. Introduction

(1)Magnetic resonance imaging (MRI) is a functional tool for clinical diagnosis.

(2)Artifacts such as ghosting or blurring

        ①May caused by imprecise instrument

        ②Mostly due to motion of subjects

(3)For one hospital, artifact may bring $250,000 loss per year and 15.4% images are note accurate

underpinning v. 基础;加强;巩固;构成(…的基础等)  n.(学说、理论等的)基础;基础结构;基础材料;(人的)腿

modality n.方式;形式;形态;情态;样式;感觉形式;感觉模式

suboptimal adj.次优的;次优

morphometric n.形态学;形态测定

cortical atrophy 皮质萎缩

spurious adj.虚假的;伪造的;谬误的;建立在错误的观念(或思想方法)之上的

(4)Even though many approaches are devised, they do not actually works. Hence, keep researching is needed.

(5)Previous work in handling artifacts

        ①Woodard and Carley-Spencer designed a system that can distinguish between original and distorted images in 2006.

        ②Mortamet et al. identified air background around head to scan in 2009. (这个应该是对数据集处理,而不是现场。因为感觉文中表达的意思是他们用这个方法去label了现有数据为高质量和低质量。然后和人工区分作比较,他们的第一种分类正确率超过了85%。然后第二种用支持向量机的超过了80%)

        ③Kim et al. uesd more than 2 IQM to classify and get 89% accuracy in 2019

        ④Esteban et al. verify the generalization ability of IQM by cross-validation in 2017

        ⑤Another deep learning way. Also, in small datasets, ML may be better than DL.

metric adj.公制的;米制的;按公制制作的;用公制测量的  n.度量标准;诗体;韵文;诗韵;[数学]度量

consortium n.(合作进行某项工程的)财团,银团,联营企业;联盟

attenuate vt.(使)减弱;使降低效力  adj.减弱的;细的;渐尖的;稀薄的

neonate n.(尤指出生不足四周的)新生儿

idiosyncratic adj.怪异的;乖僻的

(6)In this paper, author use DL with ⭐ lightweight and less image pre-processing to achieve rapid decision. Additionally, compared it with support vector machine(SVM), eXtreme Gradient Boosting ensemble(XGB) and random forest(RF).

2.3. Materials and methods

2.3.1. Data

(1)In-house data

        ①Dataset: collecting data in their own lab and call it as MR-ART dataset with no history of mental illness in subjects. Through 148 subjects, collecting 70184 scanning picture.

        ②Apparatus: "Siemens Magnetom Prisma 3T MRI scanner (Siemens Healthcare GmbH, Erlangen, Germany) with the standard Siemens 20-channel head-neck receiver coil at the Brain Imaging Centre, Research Centre for Natural Sciences"

        ③Instrument setup: 过于复杂不愿赘述,可参考原文。

        ④Experimental design: staring one point. For standing scan, subjects are not allowed to move. And for HM1 and HM2 scan, they need node while seeing "MOVE".

anatomical adj.解剖的;解剖学的;(人或动物)身体结构上的

calibrate vt.校准;标定(刻度,以使测量准确)

isotropic adj.各向同性的;等方性的

sagittal adj.矢状的;箭头形的

(2)Public data

        ①Dataset: On the one hand, getting open datasets in UK Biobank database, divided in "usable" and "unusable". On the other hand, accquiring imaging data from OASIS-3, including healthy and demential subjects.

dementia n.痴呆;精神错乱


2.3.2. Labeling

        ①Manually scoring images by experienced radiologists. Images judged by junior radiologists as not of the best quality will be re evaluated by senior radiologists

        ②Score 1: good quality

        ③Score 2: medium quality, blur but visible in lesion

        ④Score 3: bad quality, cannot be identified

        ⑤A) usable (1,2), B) unusable (3)

2.3.3. Data splitting

        ①Training sets: 1661, 80.16% (subdivided to 5 folds

        ②Test sets: N = 411, 19.84%

        ③Then reset usable image label to 1, unsuable to 0

stratify v.分层

2.3.4. Classification based on image quality metrics

(1)Image quality metrics

        ①Many IQMs(过于复杂和专业不一一列举)

        ②Correction of IQMs(原文不存在数学公式,我自己列的)

where D is original data, μ() is calculating mean value, ρ() is interquartile range of the given database(estimate it in training set)

        ③Discarding features with minimum balanced accuracy is larger than 33.3% in database (the accuracy is calculated by SVM with a 5-fold stratified cross-validation)

        ④By this, they removed 7 features and kept 55

intracranial adj. 颅内的    cerebrospinal adj.脑脊液;脑脊髓的

(2)Classification paradigms

        ①For SVM, XGB and RF models with and without hyperparameter optimization, also with and without feature selection. Then constructed four classification paradigms which called Elastic Net (ENet). Furthermore, if chosing both hyperparameter optimization and feature selection, optimizing after selection.

2.3.5. Classification based on minimally pre-processed images using deep learning

(1)Image pre-processing and neural network architecture

        ①Convolutional kernel: K × 1 × 1, 1 × K × 1 and 1 × 1 × K 

        ②Image format: DICOM→NIfTI

        ③Pre-processing of images: through multi-dimensional image processing package by SciPy ecosystem, zooming these images. Then standardize the voxel by:

        ④⭐Network structure is as follows, and the innovation is setting kernel to strip. And in this picture, they presented both 3D-CNN-v1 and 3D-CNN-v2 frameworks simultaneously.

interpolation n.插值;插补文字;插入文字

Schematic depiction 示意图

(2)Neural network training and evaluation

        ①Data augmentation: to balance the number of usable and unusable images to 50:50 in training set, they used Scipy to rotate images.

        ②Weight training: He normal initialization in convolutional layers and Glorot uniform initialization in output layer (权重/参数初始化方法 - 知乎 (zhihu))

        ③Bias in ully connected layers: initialized with 0

        ④Loss function: binary cross-entropy loss

        ⑤Optimization: Adam optimization

        ⑥Learning rate: 0.0005

        ⑦Batch size: 4

        ⑧Image filling: using padded_batch function of the tf.data.Dataset API. Dropout regularization

        ⑨Dropout rate: 0.5 in fully connected hidden layers

        ⑩Epoch: 40-500

        ⑪Comparison: statistical hypothesis tests

        ⑫Additional two experiments to test if distribuition of labels affects classification performance

pad out 拉长…的篇幅

2.3.6. Evaluation of classification performance

(1)Classification performance metrics

        ①Accuracy: percentage of correctly classified records

        ②Sensitivity: percentage of correctly classified records with motion artifacts

        ③Specificity: percentage of correctly classified records without motion artifacts

        ④Balanced accuracy score (BAS): mean of sensitivity and specificity

        ⑤Area under the receiver operating characteristic (AUROC): calculating the classifying ability

(2)Comparison of classifiers

        ①Evaluate consistency: calculating Matthew's correlation coefficient (MCC) or phi coefficient in confusion matrices

        ②Dissimilarity in confusion matrices: Kappa analysis

        ③Dissimilarity in misclassification: McNemar's test

        ④Dissimilarity in area under the receiver operating characteristic (AUROC): DeLong test

        ⑤Dissimilarity in receiver operating characteristic (ROC): Venkatraman test

        ⑥Bonferroni correction

        ⑦Library: mlxtend and scikit-learnand custom Python scripts together with the pROC

2.4. Results

2.4.1. Classification based on image quality metrics

(1)Comparing SVM, XGB, RF with and without feature selection as well as with and without hyperparameter optimization

(2)Chosing the highest SVM (including SVM-v1 and SVM-v2) to analysing their confusion matricies. v1 means without feature selection and v2 with. Besides, both of them use hyperparameter optimization.

Conclusion: there is no significant influence in feature selection presented in confusion matricies

(3)Importance of features

2.4.2. Classification based on minimally pre-processed images using deep learning

(1)Confusion matrices and contingency table for 3D-CNN-v1 and 3D-CNN-v2

Conclusion: no significant difference in confusion matrices but significant in misclassification rate. 3D-CNN-v1 is more likely to classify 1 (usable) to 0 (unusable). (我表示怀疑,在下图不都是3.57%吗,我觉得应该是把0错误分类成了1的概率太大了)Thus, 3D-CNN-v2 is a better model.

(2)The author speculates that the proportion of labels in the dataset will affect classification performance. While label in MR-ART is quite balanced, and 0 (unusable) in UKBB and OAS3 is far less than 1 (usable). Through 5-fold cross-validation, they concludes there may be no influence in data distribution. (我觉得之后可以再细看看方法和作者说的局限性)

(3)Image quality of UKBB is generally good, but OAS3 is quite bad

2.4.3. Comparison of the two classification approaches

(1)They chose 3D-CNN-v2 and VM-v1 without feature selection.

2.5. Discussion

2.5.1. Advantages

(1)There is no significant difference between SVM and 3D-CNN.

(2)3D-CNN performs good in classifying.

(3)⭐Machine learning may involve pre feature learning without simply using an end-to-end approach. The advantage of deep learning is that there is no need for pre learning.

(4)Poor quality images do not affect classification

2.5.2. Limits

(1)Lack of usable data

(2)Expensive in calculating while processing 3D scans. Hence some reseachers use 2D CNNs trained on slices

(3)The replacement of software, hardware and diagnostic requirements, as well as changes in the patient population, may lead to the need for model retraining

(4)Many hospitals prohibit the use of cloud facilities for data export

(5)Usable and unusable training data is not balanced

(6)Authors use T1 images but there are many T2-weighted or diffusion weighted images in clinical practice

2.6. Conclusions

        Lightweight DL can also be used in classifying MRI.

3. 概念补充

3.1. Spatially separable convolution


        Approach: split matrix to outer product of two (or more) vectors:

Through this way, 3*3 kernel with multiplying 9 times is replaced by firstly multiplying 3 times with a 3*1 vector then multiplying 3 times with a 1*3 vector.


        ①Reduction in multiplication times

        ②Reduced computational complexity

        ③Faster network speed

3.2. Depthwise separable convolution


        ①Depthwise Convolution

        ②Pointwise Convolution

Please see it in: 可分离卷积(Separable convolution)详解_@左左@右右的博客-CSDN博客

3.3. Confusion matrices


a chart or table that summarizes the performance of a classification model or algorithm for machine learning processes(What Is a Confusion Matrix? (Plus How To Calculate One) | Indeed)

(2)Example(混淆矩阵 confusion matrices-CSDN博客)

4. Reference List

Vakli, P. et al. (2023) 'Automatic brain MRI motion artifact detection based on end-to-end deep learning is similarly effective as traditional machine learning trained on image quality metrics', Medical Image Analysis, vol. 88, 102850. doi: Redirecting


[论文精读]Automatic brain MRI motion artifact detection based on end

