admin管理员组

文章数量:1622542

Multi-modal系列论文研读目录

第一篇:《Elevating Fake News Detection Through Deep Neural Networks, Encoding Fused Multi-Modal Features》


文章目录

  • Multi-modal系列论文研读目录
  • 1.论文题目含义
  • 2.ABSTRACT(摘要)
  • 3.INDEX TERMS(索引词)
  • 4.INTRODUCTION(引言)
  • 5.LITERATURE REVIEW(文献综述)
  • 6.MATERIALS AND METHODS(材料和方法)
    • A. FEATURE EXTRACTION 特征提取
      • Textual-Feature-Extractors 文本特征提取器
        • 1) PREPROCESSING 预处理
        • 2) EMBEDDING
        • 3) CNN
        • 4) GRU
      • Visual-Feature-Extractors 视觉特征提取器
    • B. FUSION FEATURES 融合特征
    • C. DIMENSIONALITYREDUCTION 尺寸缩减
    • D. CLASSIFICATION STAGE 分类阶段
  • 7.EVALUATION CRITERIA (评价标准)
  • 8.EXPERIMENTS(实验)
    • A. DATASET 数据集
      • 1) WEIBO
      • 2)FAKEDDIT
    • B. EXPERIMENTAL SETTING
    • C. ABLATION EXPERIMENT
    • C. BASELINES 基线工作
  • 9.RESULTS AND DISCUSSION(结果和讨论)
  • 10.CONCLUSION AND FUTURE WORK(结论和今后工作)


1.论文题目含义

通过深度神经网络、编码融合的多模态特征提升假新闻检测

2.ABSTRACT(摘要)

Textual content was initially the main focus of traditional methods for detecting fake news,and these methods have yielded appointed results. However, with the exponential growth of social media platforms, there has been a significant shift towards visual content. Consequently, traditional detection methods have become inadequate for completely detecting fake news. This paper proposes a model for detecting fake news using multi-modal features. The model involves feature extraction, feature fusion, dimension reduction, and classification as its main processes. To extract various textual features, a pre-trained BERT, gated recurrent unit (GRU), and convolutional neural network (CNN) are utilized. For extracting image features, ResNet-CBAM is used, followed by the fusion of multi-type features. The dimensionality of fused features is reduced using an auto-encoder, and the FLN classifier is then applied to the encoded features to detect instances of fake news. Experimental findings on two multi-modal datasets, Weibo and Fakeddit, demonstrate that the proposed model effectively detects fake news from multi-modal data, achieving 88% accuracy with Weibo and 98% accuracy with Fakeddit. This shows that the proposed model is preferable to previous works and more effective with the large dataset.
翻译:文本内容是最初的传统方法检测假新闻的主要焦点,这些方法已经取得了指定的结果。然而,随着社交媒体平台的指数增长,视觉内容已经发生了重大转变。因此,传统的检测方法已经不足以完全检测假新闻。提出了一种基于多模态特征的虚假新闻检测模型。该模型包括特征提取、特征融合、降维和分类等主要过程。为了提取各种文本特征,使用了预训练的BERT,门控递归单元(GRU)和卷积神经网络(CNN)。对于图像特征的提取,使用ResNet-CBAM,其次是多种类型的特征的融合。使用自动编码器降低融合特征的维度,然后将FLN分类器应用于编码特征以检测假新闻的实例。在微博和Fakeddit两个多模态数据集上的实验结果表明,该模型可以有效地从多模态数据中检测假新闻,微博的准确率达到88%,Fakeddit的准确率达到98%。这表明,该模型是优于以前的作品,更有效的大数据集。

3.INDEX TERMS(索引词)

Social media platforms, fake news detection, multi-model features, deep learning, fusion,auto-encoder, dimensionality reduction.
社交媒体平台,假新闻检测,多模型特征,深度学习,融合,自动编码器,降维。

4.INTRODUCTION(引言)

  1. In today’s digital age, the rapid spread of information through social media and online platforms has revolutionized how we consume news. However, this exceptional proliferation of information has simultaneously facilitated the propagation of disinformation, thereby developing the prevalent issue of bogus news. Fake news consists of intentionally contrived or misleading information that is presented as authentic news, intending to deceive and manipulate public sentiment or profit, and exploiting the followers of establishments that have a large fan base through the distribution of this misinformation. For example, Kylian Mbappe’s fake contract, as shown in Figure 1. The proliferation of fake news poses significant challenges to society, affecting political discourse, and public perception, and even influencing critical decision-making processes. For example, there were numerous fabrications in the wake of the 2020 U.S. presidential election. Many voters who were preoccupied with election news erroneously believed that election fraud had occurred, with 40 percent ofthem preserving that Biden’s election was illegal [1]. As the impact of fake news continues to grow, there is an urgent need to develop robust and reliable methods for its detection.在当今的数字时代,信息通过社交媒体和在线平台的快速传播彻底改变了我们消费新闻的方式。然而,这种信息的异常扩散同时也促进了虚假信息的传播,从而形成了虚假新闻的普遍问题。假新闻包括故意人为或误导性的信息,这些信息被呈现为真实的新闻,意图欺骗和操纵公众情绪或利润,并通过传播这些错误信息来利用拥有大量粉丝基础的机构的追随者。例如,Kylian Mbappe的假合同,如图1所示。假新闻的泛滥对社会构成了重大挑战,影响了政治。负责协调本稿审查并批准出版的副主编是张建康。话语和公众的看法,甚至影响关键的决策过程。例如,在2020年美国总统大选之后有许多捏造。许多专注于选举新闻的选民错误地认为发生了选举舞弊,其中40%的人认为拜登的选举是非法的。随着假新闻的影响不断扩大,迫切需要开发强大且可靠的检测方法。

  2. Fake news detection (FND) is an interdisciplinary domainthat combines expertise in natural language processing(NLP), machine learning (ML), data analysis, and medialiteracy. The primary goal is to identify and distinguishbetween legitimate news articles and fabricated, misleading,or biased content.假新闻检测(FND)是一个跨学科领域,结合了自然语言处理(NLP),机器学习(ML),数据分析和媒体素养的专业知识。主要目标是识别和区分合法的新闻文章和捏造的,误导性的或有偏见的内容。

  3. ML algorithms play a pivotal role in fake news detection,leveraging vast amounts of labeled data to learn patterns and features that distinguish between trustworthy and deceptive information [2]. These algorithms can analyze textual content, metadata, user behavior, and the source’s credibility to make informed judgments about the veracity of news articles. Moreover, with the advancement ofdeep learning techniques, researchers and practitioners are continually developing sophisticated models capable of handling the dynamic and evolving nature of fake news. Deep neural networks can automatically extract complex patterns, enabling the detection ofsubtle linguistic cues and context-specific signals that might indicate the presence of fake news. Despite the ongoing efforts, FND remains a challenging task due to the adaptive tactics employed by misinformation propagators. The detection process needs to be agile, able to adjust and improve as new forms of misinformation emerge. In this pursuit, collaboration between academia, industry, and policymakers is vital to developing comprehensive strategies and systems to combat the fake news epidemic effectively. By implementing effective detection techniques and media literacy advertisements, we can enable individuals to evaluate information critically and foster a society that is better informed and more resilient to the threats posed by misinformation.ML算法在假新闻检测中发挥着关键作用,它利用大量标记数据来学习区分可信信息和欺骗性信息的模式和特征[2]。这些算法可以分析文本内容、元数据、用户行为和来源的可信度,从而对新闻文章的真实性做出明智的判断。此外,随着深度学习技术的进步,研究人员和从业人员正在不断开发能够处理假新闻动态和不断变化性质的复杂模型。深度神经网络可以自动提取复杂的模式,从而能够检测到可能表明假新闻存在的微妙语言线索和特定于上下文的信号。尽管正在进行的努力,FND仍然是一个具有挑战性的任务,由于自适应策略采用的错误信息传播者。检测过程需要灵活,能够随着新形式的错误信息的出现而进行调整和改进。在这一过程中,学术界、产业界和政策制定者之间的合作对于制定全面的战略和系统以有效打击假新闻流行至关重要。通过实施有效的检测技术和媒体素养广告,我们可以使个人能够批判性地评估信息,并促进一个更好地了解和更能抵御错误信息所带来对社会的威胁。

  4. In this research paper, we delve into the techniques, methodologies, and challenges associated with FND. We explore the latest advancements in ML and NLP, discussing how these technologies can be harnessed to protect the integrity of information in the digital era. By understanding the landscape of fake news and the tools for its identification, we take a significant step toward fostering a more trustworthy and reliable information ecosystem. This paper is composed of multi-type feature extracting modules and a feature fusion module, then adopts an attention mechanism to reduce the dimensionality of fusion features. The feature extraction modules include text feature extractors and a visual feature extractor.在这篇研究论文中,我们深入研究了与FND相关的技术,方法和挑战。我们探索ML和NLP的最新进展,讨论如何利用这些技术来保护数字时代的信息完整性。通过了解假新闻及其识别工具的情况,我们朝着建立一个更值得信赖和可靠的信息生态系统迈出了重要的一步。该方法由多类特征提取模块和特征融合模块组成,并采用注意力机制对融合后的特征进行降维。特征提取模块包括文本特征提取器和视觉特征提取器。

  5. A pre-trained BERT module to extract contextual features, the deep neural network convolutional neural network (CNN) and a gated recurrent unit (GRU) to extract spatial and sequential features subsequently. ResNet-50 combined with the CBAM attention mechanism for extracting the visual feature. The concatenate module will fuse the text and image features to get fusion features. An auto-encoder reduces the dimensionality of fusion feature, then fed the reduction features into a fast learning network classifier to get detection results. The model integrates the features of multiple models more fully so that it can learn the deeper correlation between the models, which will be conducive to improving the performance of detection tasks. The main contributions of this paper are mentioned below: • Employed a novel stacked model of state-of-the-art multi-modal deep neural networks for efficiently classifying between fake and real news. • The deep learning ensemble model is proposed by leveraging the power of deep neural networks to extract multi-type features and then fuse these features. • Adopted structured multi-modal datasets that contain pairs of texts and images to get more features in order to elevate the fake news detection. • Dimensionality reduction of the fused feature via the auto-encoder to enhance the classifier results.预训练的BERT模块用于提取上下文特征,深度神经网络卷积神经网络(CNN)和门控递归单元(GRU)用于随后提取空间和顺序特征。ResNet-50结合CBAM注意机制进行视觉特征提取。拼接模块将文本特征和图像特征进行融合,得到融合特征。自动编码器对融合特征进行降维处理,然后将降维后的特征输入快速学习网络分类器,得到检测结果。该模型更充分地融合了多个模型的特征,能够更深层次地学习模型之间的相关性,有利于提高检测任务的性能。本文的主要贡献如下:(1)采用了最先进的多模态深度神经网络的新型堆叠模型,用于有效地对假新闻和真实的新闻进行分类。(2)深度学习集成模型通过利用深度神经网络的能力来提取多类型特征,然后融合这些特征。(3)采用包含文本和图像对的结构化多模态数据集,以获得更多特征,从而提高假新闻检测。(4)经由自动编码器的融合特征的模糊性降低以增强分类器结果。

  6. The following parts of this work are structured as follows: A literature review has been provided in Section II. We have presented and discussed the materials and methodology of the proposed model in Section III. In Section IV, the discussion and experimental results are presented. In Section V, the conclusion and future work direction are discussed.本研究的以下部分结构如下:第II节提供了文献综述。我们已经在第三节中介绍并讨论了拟议模型的材料和方法。第四节给出了实验结果和讨论。第五部分,对本文的结论和今后的工作方向进行了讨论。

5.LITERATURE REVIEW(文献综述)

  1. Researchers are increasing their efforts to identify answers in response to the rise of misleading content on social media. A multitude of scholarly investigations have been conducted in this specific domain; we shall now highlight a carefully chosen subset of them. Prior research in this domain, similar to numerous other domains in NLP, mostly concentrated on probabilistic techniques. Nevertheless, subsequent to the introduction of deep learning (DL), numerous researchers embraced DL and made progress in this field. In the beginning, research efforts were focused on contrasting probabilistic approaches with DL methods. Following that, they advanced towards proposing more complex DL models. For example, article [3] provides a comprehensive study of the methods employed to identify misinformation on social media, encompassing classifications of fake news according to social and psychological theories, current algorithms analyzed through the lens of data mining, evaluation metrics, and representative datasets.研究人员正在加大努力,以确定答案,以应对社交媒体上误导性内容的增加。在这一特定领域已经进行了大量的学术研究,我们现在将重点介绍其中精心挑选的一个子集。与NLP中的许多其他领域类似,该领域的先前研究主要集中在概率技术上。然而,在引入深度学习(DL)之后,许多研究人员接受了DL并在该领域取得了进展。一开始,研究工作集中在对比概率方法与DL方法。在此之后,他们进一步提出了更复杂的DL模型。例如,文章[3]全面研究了识别社交媒体上错误信息的方法,包括根据社会和心理学理论对假新闻进行分类,通过数据挖掘的透镜分析当前算法,评估指标和代表性数据集。
  2. The authors of [4] examine cutting-edge and advanced methods for detecting fake news and elaborate on the dataset and NLP techniques utilized in prior investigations. A thorough examination of DL-based methodologies has been presented in order to classify illustrative approaches into distinct categories. In [5], propose a novel higher-order user-to-user mutual-attention progression (HiMaP) method to capture the cues related to the authority or influence of the users by modelling direct and indirect (multi-hop) influence relationships among each pair of users present in the propagation sequence. The authors of [6] suggest a model to reduce the feature vectors’ dimensionality before feeding them to the classifier. SAFE is a method described in [7] that analyzes news articles for multi-modal information to detect fake news. To get started, neural networks are utilized to autonomously extract textual and visual features for news representation. Further investigation is conducted into the relationship between the extracted features across modalities. The relationship between such visual and textual representations of news information are also jointly learned and utilized to predict fake news.[4]的作者研究了检测假新闻的尖端和先进方法,并详细阐述了先前调查中使用的数据集和NLP技术。一个彻底的检查DL为基础的方法,以分类说明性的方法分为不同的类别。在[5]中,提出了一种新的高阶用户到用户相互关注进展(HiMaP)方法,通过对传播序列中存在的每对用户之间的直接和间接(多跳)影响关系进行建模,来捕获与用户的权威或影响力相关的线索。[6]的作者提出了一种模型,在将特征向量馈送到分类器之前降低特征向量的维数。SAFE是[7]中描述的一种方法,它分析新闻文章的多模态信息以检测假新闻。首先,利用神经网络自主提取新闻表示的文本和视觉特征。进一步调查进行跨模态提取的特征之间的关系。新闻信息的这种视觉和文本表示之间的关系也被联合学习和利用来预测假新闻。
  3. A multi-modal variational auto-encoder (MVAE) network is presented in [8] that combines a binary classifier with a bimodal variational auto-encoder to detect fake news. Three primary components comprise the model: an encoder, a decoder, and a module for detecting fake news. The variational auto-encoder can be used to learn probabilistic models for latent variables by finding the best bound on the observed data’s marginal likelihood.在[8]中提出了一种多模态变分自动编码器(MVAE)网络,该网络将二元分类器与双峰变分自动编码器相结合,以检测假新闻。该模型由三个主要组件组成:编码器、解码器和用于检测假新闻的模块。变分自动编码器可用于通过找到观测数据的边际似然的最佳界限来学习潜在变量的概率模型。
  4. The authors of [9] present an innovative hybrid system for detecting fake news. This system merges the strengths of linguistic and knowledge-based approaches by utilizing two distinct sets of features: linguistic and a novel set of knowledge-based features. The system’s performance on a simulated news dataset demonstrates that it is capable of achieving a commendable level of accuracy in identifying fake news.在[8]中提出了一种多模态变分自动编码器(MVAE)网络,该网络将二元分类器与双峰变分自动编码器相结合,以检测假新闻。该模型由三个主要组件组成:编码器、解码器和用于检测假新闻的模块。变分自动编码器可用于通过找到观测数据的边际似然的最佳界限来学习潜在变量的概率模型。
  5. A novel multi-modal topic memory network (MTMN) is described in [10]. It makes a good representation by using multi-modal fusion to take advantage of the connections between modes within each mode as well as the connections between modes between text terms and image regions. Reference [11] A multi-level multi-modal cross-attention network (MMCN) is presented in this article; it employs a network to facilitate cross-attention between various modalities. The MMCN has been purposefully developed to integrate the feature embedding of image regions and text words by concurrently taking into account duplicate data relationships and various modalities.在[10]中描述了一种新颖的多模态主题记忆网络(MTMN)。该方法利用多模态融合技术,充分利用了各模态之间的联系以及文本项与图像区域之间的联系,从而获得了较好的表示效果。参考文献[11]本文提出了一种多级多模态交叉注意网络(MMCN);它采用一个网络来促进各种模态之间的交叉注意。MMCN已经被有目的地开发为通过同时考虑重复数据关系和各种模态来集成图像区域和文本词的特征嵌入。
  6. However, most of the studies mentioned do not take the variety of features in multi-modal datasets into account, resulting in limited results. In addition, most ofthe models are unsuccessful in obtaining adequate detection performance. To overcome these limitations, we employ multi-model feature extraction and dimensionality reduction for the fusion features, followed by the classification stage.然而,大多数提到的研究没有考虑到多模态数据集的各种特征,导致结果有限。此外,大多数的模型是不成功的,在获得足够的检测性能。为了克服这些局限性,我们采用多模型特征提取和降维的融合功能,其次是分类阶段。

6.MATERIALS AND METHODS(材料和方法)

The overall proposed model is shown in Figure 2.如下图所示

This research paper employs a combined analysis oftextual and visual data to assess the reliability of news. Based on this, we propose a multi-modal deep neural network with a fusion module to obtain deep connections among textual and visual features. This section covers the proposed model in detail.本文采用文本和视觉数据相结合的方法来评估新闻的可信度。在此基础上,我们提出了一种具有融合模块的多模态深度神经网络,以获得文本和视觉特征之间的深层联系。本节将详细介绍拟议的模型。

A. FEATURE EXTRACTION 特征提取

Textual-Feature-Extractors 文本特征提取器

1) PREPROCESSING 预处理

Preprocessing of the text data is required, involving techniques such as stop word elimination, tokenization, and removal of punctuation. These processes can greatly assist in the selection of the most significant statements and enhance the performance of the model [12].需要对文本数据进行预处理,包括消除停用词、标记化和删除标点符号等技术。这些过程可以极大地帮助选择最重要的语句,并提高模型的性能[12]。

2) EMBEDDING

Utilizing the BERT pre-trained language model, this module is primarily tasked with transforming every word in the preprocessed sentence into a dense vector of the same dimension [13]. BERT accepts an entire sentence as input, in contrast to the static word embedding method which converts a word to a vector by consulting the word representation table directly. It uses the hidden state of the final hidden layer or the second-to-last hidden layer to represent each word in the sentence dynamic vector representation. Thus, various contexts will result in distinct vector representations for a given word, which provides a more precise expression of its semantics [14]. The procedures for the overall pre-training and subsequent fine-tuning of BERT are mentioned in [15]. Except for the output layers, exact configurations are utilized in both the pre-training and fine-tuning processes. Parameters derived from pre-existing models are employed to initialize models for different subsequent tasks. Each parameter is meticulously adjusted during the entire procedure. The special symbol [CLS] comes before each input example, and [SEP] serves as a separator token. For example, [SEP] is used to separate questions and answers, as demonstrated in Figure 3.利用BERT预训练语言模型,该模块的主要任务是将预处理句子中的每个单词转换为相同维度的密集向量[13]。BERT接受整个句子作为输入,与静态单词嵌入方法相反,静态单词嵌入方法通过直接查询单词表示表将单词转换为向量。它使用最后一个隐藏层或倒数第二个隐藏层的隐藏状态来表示句子动态向量表示中的每个单词。因此,不同的上下文将导致给定单词的不同向量表示,这提供了其语义的更精确表达[14]。BERT的整体预训练和后续微调的过程在[15]中提到。除了输出层之外,在预训练和微调过程中都使用了精确的配置。来自预先存在的模型的参数被用来初始化不同的后续任务的模型。在整个过程中,每个参数都经过精心调整。特殊符号[CLS]出现在每个输入示例之前,[SEP]用作分隔符。例如,[SEP]用于分离问题和答案,如图3所示。

3) CNN

We have presented a comprehensive analysis of the benefits of employing convolutional neural networks (CNNs) for extracting features in NLP tasks. Convolutional neural networks, known for their remarkable performance in computer vision tasks, have also proven to be highly effective in NLP tasks. A key benefit of convolutional neural networks in NLP is their capacity to effectively collect and analyze local patterns and relationships present in the text [16]. CNN can use one-dimensional convolutions (Conv1D) to get local features by treating words or characters as one-dimensional data. This is especially helpful for tasks that need to capture n-grams or short phrases, like sentimental analysis or finding important parts of text categorization. CNN can handle inputs of different lengths well, which makes it a good choice for handling texts of different lengths. For many NLP jobs, these models can be used because they are flexible and do not require set input sizes. The number of filters and the kernel size must be set to train the CNN. Figure 4 shows a picture of the Conv1D process. CNN can get hierarchical statements of the text and local features at different levels by setting up many convolutional layers with various filter sizes and hyperparameters. This lets them get useful information from the data they put in.我们已经对在NLP任务中使用卷积神经网络(CNNs)提取特征的好处进行了全面的分析。卷积神经网络以其在计算机视觉任务中的卓越性能而闻名,也被证明在NLP任务中是非常有效的。NLP中卷积神经网络的一个关键优势是它们能够有效地收集和分析文本中出现的局部模式和关系[16]。CNN可以使用一维卷积(Conv1D)通过将单词或字符视为一维数据来获得局部特征。这对于需要捕获n元语法或短语的任务(如情感分析或查找文本分类的重要部分)尤其有用。CNN可以很好地处理不同长度的输入,这使得它成为处理不同长度文本的一个很好的选择。对于许多NLP作业,可以使用这些模型,因为它们很灵活,并且不需要设置输入大小。必须设置滤波器的数量和内核大小以训练CNN。图4显示了Conv1D过程的图片。CNN通过设置不同滤波器大小和超参数的卷积层,可以得到文本和局部特征在不同层次上的层次化描述。这使他们能够从他们输入的数据中获得有用的信息。

4) GRU

The gated recurrent unit (GRU), which is a variation of LSTM, has a reduced number of model parameters and exhibits superior training efficiency when compared to some other LSTM models. It has successfully extracted the main characteristic from the text. GRU can model sequential dependencies, capture short-term memory, and address the vanishing gradient problem. The gating mechanism allows it to selectively update and utilize information from previous time steps, making them suitable for tasks involving sequential data, including NLP tasks, such as machine translation, named entity recognition, and classification. Figure 5 depicts the overall architecture of the GRU, as described in [17]. In GRU, there are two gates: an update gate (z) and a reset gate ®. These gates control the amount of information that is updated and forgotten, respectively. Furthermore, h and h˜ correspondingly convey current and updated information. The computation of z, r, h, and ˜h at time step s is conveniently given as follows:门控递归单元(GRU)是LSTM的一种变体,与其他一些LSTM模型相比,它具有更少的模型参数,并且具有上级的训练效率。它成功地从文本中提取了主要特征。GRU可以建模顺序依赖关系,捕获短期记忆,并解决消失梯度问题。门控机制允许它选择性地更新和利用来自先前时间步的信息,使其适用于涉及顺序数据的任务,包括NLP任务,如机器翻译,命名实体识别和分类。(如下图)图5描述了GRU的整体架构,如[17]所述。

在GRU中,有两个门:更新门(z)和复位门(r)。这些门分别控制更新和遗忘的信息量。此外,h和h ′相应地传达当前和更新的信息。在时间步长s处z、r、h和 h~的计算方便地给出如下:

where σ is a nonlinear function, e.g., the sigmoid function, xs denotes the input vector at the time step s, and J represents an element-wise multiplication. W(z), W®, W(h), U(z), U®, and U(h) are all weights to be learned.其中σ是非线性函数,例如,sigmoid函数,xs表示时间步长s处的输入向量,并且J表示逐元素乘法。W(z)、W(r)、W(h)、U(z)、U(r)和U(h)都是要学习的权重。

Visual-Feature-Extractors 视觉特征提取器

Previous methods commonly employed a VGG-based model to extract visual features for the analysis of multimodal data, specifically in the context of visual content processing, such as images [18]. Nevertheless, compared to the features extracted by VGG, those from ResNet are more representative and discriminative. Thus, we utilize the ResNet model in our work to acquire the visual features of image data. Certain existing approaches solely rely on the attributes of the last layer of ResNet, which, under certain circumstances, disregard a significant amount of intricate visual data. To improve the representation of visual semantic information, we derive detailed area features from the image using the second-to-last layer of ResNet. The visual feature extractor utilizes the ResNet model [19], which has undergone pre-training on a vast array of picture datasets, to extract visual features.Residual network enhances the transformation capabilities of neural networks by introducing residual connections, also known as skip connections. These connections allow the network to bypass one or more layers and add the input of those layers directly to their output. This approach helps mitigate vanishing gradients, which is common in intense networks and can lead to decreasing model accuracy as the network depth increases. ResNet addresses the issue of reducing model accuracy with increased depth by using residual connections, which facilitate more effective training of deep networks by maintaining gradient flow and allowing layers to learn residual mappings more easily. The visual feature extractor in this paper utilizes the ResNet-50 model for extracting visual features. Combining the convolutional attention module (CBAM) [20] into the model in this study is meant to help it focus on the important parts of the image. The CBAM module will deduce the attention weights for a given intermediate feature graph by considering two separate dimensions (channel and space) sequentially. It will then multiply the attention weights with the input feature map to achieve adaptive feature optimization, as shown in Figure 6. A fully connected layer is added to the output of CBAMResNet-50 to keep the original model structure of ResNet-50 and use its pre-training parameters to stop overfitting. This makes sure that the text has the same size as the image’s features and hidden state.以前的方法通常采用基于VGG的模型来提取视觉特征以用于多模态数据的分析,特别是在视觉内容处理的上下文中,例如图像[18]。然而,与VGG提取的特征相比,ResNet提取的特征更具代表性和区分性。因此,我们在我们的工作中使用ResNet模型来获取图像数据的视觉特征。某些现有方法仅依赖于ResNet最后一层的属性,在某些情况下,忽略了大量复杂的视觉数据。为了改善视觉语义信息的表示,我们使用ResNet的倒数第二层从图像中获得详细的区域特征。视觉特征提取器利用ResNet模型[19]来提取视觉特征。ResNet模型已经在大量图片数据集上进行了预训练。残差网络通过引入残差连接(也称为跳过连接)来增强神经网络的转换能力。这些连接允许网络绕过一个或多个层,并将这些层的输入直接添加到其输出。这种方法有助于缓解消失梯度,这在密集网络中很常见,并且随着网络深度的增加,可能导致模型精度降低。ResNet通过使用残差连接解决了随着深度增加而降低模型准确性的问题,通过保持梯度流并允许层更容易地学习残差映射,从而促进更有效的深度网络训练。本文中的视觉特征提取器利用ResNet-50模型来提取视觉特征。在这项研究中,将卷积注意力模块(CBAM)[20]结合到模型中旨在帮助它专注于图像的重要部分。CBAM模块将通过依次考虑两个单独的维度(通道和空间)来推导给定中间特征图的注意力权重。然后将注意力权重与输入特征映射相乘,以实现自适应特征优化,如图6所示。在CBAMResNet-50的输出中添加了一个全连接层,以保持ResNet-50的原始模型结构,并使用其预训练参数来停止过拟合。这可以确保文本的大小与图像的特征和隐藏状态相同。

B. FUSION FEATURES 融合特征

The series method results in feature fusion. The text feature RT, and visual feature RV that are extracted are merged to obtain the fusion feature. The fusion process is as follows:级数方法的结果是特征融合。将提取的文本特征RT和视觉特征RV进行融合,得到融合特征。融合过程如下:

where R is the input of the auto-encoder phase.其中R是自动编码器相位的输入。

C. DIMENSIONALITYREDUCTION 尺寸缩减

In this research endeavor, we delve into the application of auto-encoders for the reduction of dimensionality in fusion features [21], [22]. Auto-encoders, a class of neural networks, excel at learning compact representations of input data by employing an encoder-decoder architecture. The encoding process captures the essential information from the fusion features, resulting in a lower-dimensional representation, while the subsequent decoding step reconstructs the original input as shown in Figure 7. The dimensionality reduction is achieved through mathematical operations at each layer of the auto-encoder. Specifically, the encoding function h(x) transforms the input x into a compressed representation h using weights (W) and biases (b).在这项奋进,我们深入研究了自动编码器在融合特征中降维的应用[21],[22]。自动编码器是一类神经网络,通过采用编码器-解码器架构,擅长学习输入数据的紧凑表示。编码过程从融合特征中捕获基本信息,从而产生低维表示,而随后的解码步骤则重建原始输入,如图7所示。通过在自动编码器的每一层的数学运算来实现降维。具体地,编码函数h(x)使用权重(W)和偏置(B)将输入x变换为压缩表示h。
Encoder mapping编码器映射:

here, f is typically a nonlinear activation function.这里,f通常是非线性激活函数。
The decoding process, aiming to reconstruct the input, where (H) is the encoded vector from hidden layer, (W ¯) is the weights associated to hidden layer neurons with biases (b¯).解码过程,旨在重建输入,其中 (H)Â是来自隐藏层的编码向量, (W ¯)是与具有偏置(b¯)的隐藏层神经元相关联的权重。
Decoder mapping解码器映射:

The ultimate goal is to minimize the reconstruction error, encouraging the auto-encoder to capture the most salient features in the reduced space. This study shows how auto-encoders can be used to reduce the number of dimensions in fusion features. It also goes into detail about the mathematical methods used for this purpose, which is useful for the fields of feature learning and representation optimization.最终的目标是最小化重构误差,鼓励自动编码器在减小的空间中捕获最显著的特征。该研究展示了如何使用自动编码器来减少融合特征中的维数,它还详细介绍了用于此目的的数学方法,这对特征学习和表示优化领域很有用。

D. CLASSIFICATION STAGE 分类阶段

In our proposed model, we utilize the Fast Learning Network (FLN) classifier [23] to classify the encoded features. The FLN exhibits a distinctive aspect in its procedure of initializing weights and constructing weighted connections. A random method is used to set the weights and biases of the hidden layer. Merely start with random weights, the network can carefully explore different parts of the weight space and find more appropriate responses. The FLN aims to optimize the learning efficiency and precision of the network by building connections between the input nodes, the hidden layer, and the output nodes as shown in Figure 8. Least squares methods are used to find the best line or gradient for a set of data points. This connectivity design facilitates a more direct flow of information from the input to the output, which has the potential to enable the network to catch significant aspects and generate precise predictions. Hence, the FLN successfully addressed most of the limitations associated with conventional learning methods while simultaneously demonstrating an exceptionally high learning velocity [12].在我们提出的模型中,我们利用快速学习网络(FLN)分类器[23]对编码特征进行分类。FLN在初始化权重和构造加权连接的过程中表现出独特的方面。使用随机方法来设置隐藏层的权重和偏置。只需从随机权重开始,网络就可以仔细探索权重空间的不同部分,并找到更合适的响应。FLN旨在通过在输入节点、隐藏层和输出节点之间建立连接来优化网络的学习效率和精度,如图8所示。最小二乘法用于为一组数据点找到最佳直线或梯度。这种连接性设计促进了从输入到输出的更直接的信息流,这有可能使网络能够捕获重要的方面并生成精确的预测。因此,FLN成功地解决了与传统学习方法相关的大多数限制,同时展示了异常高的学习速度[12]。

7.EVALUATION CRITERIA (评价标准)

To assess the efficacy of the model suggested in this research, it was evaluated based on four key metrics: accuracy, precision, recall, and F1 score [24], [25]. The confusion matrix of classification results, where (TP) and (FP) are the numbers of correctly and incorrectly classified instances of the positive class, and (TN) and (FN) are the numbers of correctly and incorrectly classified instances of the negative class, respectively. According to Table 1, we compute the following metrics and mathematically expressed them in Equations 8, 9, 10, and 11.为了评估本研究中建议的模型的有效性,基于四个关键指标进行了评估:准确性,精确度,召回率和F1评分[24],[25]。分类结果的混淆矩阵,其中(TP)和(FP)分别是正类的正确和错误分类实例的数量,(TN)和(FN)分别是负类的正确和错误分类实例的数量。根据表1,我们计算以下度量并在等式8、9、10和11中数学地表达它们。


Accuracy (Acc) refers to the ratio of accurately predicted outcomes to the total number of outcomes.准确度(Acc)是指准确预测结果与结果总数的比率。

Recall (Re): the ratio ofaccurately predicted fake news results to the total number of fake news results.召回率(Recall,Re):假新闻预测结果与假新闻结果总数的比率。

Precision (Pr) refers to the ratio of correctly recognized instances of fake news to the total number of instances that have been detected as fake news.精确度(Pr)是指正确识别的假新闻实例与已被检测为假新闻的实例总数的比率。

F1 score: the harmonic mean of recall and accuracy.F1分数:召回率和准确率的调和平均值。

8.EXPERIMENTS(实验)

A. DATASET 数据集

我们采用了两个多模态数据集:

1) WEIBO

The dataset that was provided in [26]. It was collected from May 2012 to January 2016, Xinhua, China’s reputable news agency, collected the dataset from the Weibo’s official dispelling system and true tweets from verified tweets. The balanced dataset statistics are shown in Table 2. Xinhua News Agency verifies the dataset’s posts to determine whether they are fake or legitimate news.在[26]中提供的数据集。它是从2012年5月到2016年1月收集的,中国著名的新闻机构新华社从微博的官方辟谣系统收集数据集,并从经过验证的推文中收集真实推文。平衡数据集统计数据如表2所示。新华社验证数据集的帖子,以确定它们是假新闻还是合法新闻。

2)FAKEDDIT

The novel multi-modal dataset [27]. The dataset comprises more than 1 million examples of fake news across several categories, collected through multiple modes of data collection. Following a series of reviews, the samples are categorized into 2-way, 3-way, and 6-way classification groups using distant supervision and then labeled accordingly, as shown in Figure 9. These hybrid models combine text and image data and conduct thorough experiments to explore various categorization variations. Our research highlights the significance of the innovative concept of multimodality and fine-grained classification, which is specific to Fakeddit, to evaluate our architecture for fake news detection. The problem of imbalanced class distribution in real-world datasets severely impairs the performance of classification algorithms. The learning task becomes more complicated and challenging when there is also a class overlap problem with imbalanced data. The suggested method in [27] tries to find the best subset of the majority samples to deal with both the imbalanced and the class-overlap issues at the same time while avoiding getting rid of too many majority samples, especially in areas where classes overlap. Accordingly, we selected 54,000 submissions from 878,000 entries randomly, with 30000 as the training dataset, 15000 as real news, and 15000 as fake news, while the test set had 10800 submissions, of which 5400 were classified as real news and 5400 as fake news. We adopted random selection to ensure fairness, minimize bias, maintain data integrity by avoiding the loss of valuable data points and excluding redundant outliers, and foster transparency and credibility in the methodology of the study. 新的多模态数据集[27]。该数据集包括通过多种数据收集模式收集的多个类别的100多万个假新闻示例。在一系列审查之后,使用远程监督将样本分类为2向、3向和6向分类组,然后相应地标记,如图9所示。这些混合模型结合了联合收割机的文本和图像数据,并进行了彻底的实验,以探索各种分类的变化。我们的研究强调了多模态和细粒度分类的创新概念的重要性,这是Fakeddit特有的,以评估我们的假新闻检测架构。真实数据中的类分布不均衡问题严重影响了分类算法的性能。学习任务变得更加复杂和具有挑战性时,也有一个类重叠的问题,不平衡的数据。[27]中建议的方法试图找到大多数样本的最佳子集,以同时处理不平衡和类重叠问题,同时避免摆脱太多的大多数样本,特别是在类重叠的区域。因此,我们从878,000个条目中随机选择了54,000个提交,其中30000个作为训练数据集,15000个作为真实的新闻,15000个作为假新闻,而测试集有10800个提交,其中5400个被归类为真实的新闻,5400个为假新闻。我们采用随机选择,以确保公平性,最大限度地减少偏差,通过避免丢失有价值的数据点和排除多余的离群值来保持数据的完整性,并提高研究方法的透明度和可信度。

B. EXPERIMENTAL SETTING

The framework was built on Python 3.9. The train-to-test split ratio is 8:2. For textual content in multi-modal posts, we use the pre-trained BERT module with CNN and GRU simultaneously for the textual branch. For simplicity, we fix the weights of BERT during the training phase. The ResNet50 model, combined with the CBAM attention mechanism, was used for the visual feature extraction module. The visual features were also shrunk from 2048 dimensions to 64 dimensions. To begin training on the dataset, we set the learning rate to 0.001 for 200 epochs and the batch size to 64, which are settings commonly used in studies.该框架基于Python 3.9构建。列车-试验分流比为8:2。对于多模态帖子中的文本内容,我们使用预先训练的BERT模块,同时使用CNN和GRU用于文本分支。为了简单起见,我们在训练阶段固定BERT的权重。视觉特征提取模块采用ResNet 50模型,结合CBAM注意机制。视觉特征也从2048维缩小到64维。为了开始对数据集的训练,我们将200个epoch的学习率设置为0.001,批量大小设置为64,这是研究中常用的设置。

C. ABLATION EXPERIMENT

To determine how crucial each component was to the experiment, the model’s simplified version had some parts removed. These parts were the visual feature extractor and the auto-encoder module. Model w/o image: removing the visual features and using only textual features as input to the model. At this point, the classification of the encoded content is only based on textual features. Ours w/o auto-encoder: The auto-encoder is removed, and the text and visual features are simply combined as the classification basis. Table 3 and Table 4 list the results of the ablation experiment, and two conclusions can be drawn: First, each component of the model plays an important role in the fake news detection task because the classification accuracy will decrease to some extent if any part of the model is removed. Second, the model is most accurate when the auto-encoder is added. The experimental results show that the proposed model is superior to the single-mode model in fake news detection.为了确定每个组件对实验的重要性,模型的简化版本删除了一些部分。这两部分分别是视觉特征提取模块和自动编码模块。不带图像的模型:移除视觉特征,仅使用文本特征作为模型的输入。在这一点上,编码内容的分类仅基于文本特征。我们的w/o自动编码器:自动编码器被删除,文本和视觉特征被简单地结合起来作为分类基础。表3和表4列出了消融实验的结果,可以得出两个结论:首先,模型的各个组成部分在假新闻检测任务中都起着重要的作用,因为如果去除模型的任何一部分,分类准确率都会有一定程度的下降。第二,当加入自动编码器时,模型最精确。实验结果表明,改进GVF模型在虚假新闻检测中的性能上级单模态模型。

C. BASELINES 基线工作

In order to verify the validity of the proposed model, it was compared with other multi-modal fake news detection models on the same data set. The main comparison models are as follows: MMCN [11]: The multi-level multi-modal cross-attention network that utilizes a network for cross-attention between different modalities. att-RNN [26]: Uses attention mechanisms to combine textual, visual, and social context features. In this model, the LSTM network is used to jointly represent text and social environment information, and then the attention mechanism is used to integrate visual features. MAVE [8]: By training the multi-mode variational autoencoder and reconstructing two modes from the learned shared representation to find the correlation between the modes, a better multi-mode shared representation can be obtained for fake news detection. EANN [31]: CNN and VGG19 were used to extract multimodal features, and then the multi-modal features obtained by the concatenation of the two were input into the fake news detector and event discriminator to identify the labels of each post. Bi-GCN [32]: Bi-directional graph convolutional network is one of the state-of-the-art fake news detection methods, utilizing both the content of the post and the propagation path. SeRN [33]: Stance Extraction and Reasoning Network is proposed to extract the stances implied in post-reply pairs implicitly and integrate the stance representations for fake news detection.为了验证该模型的有效性,在相同的数据集上与其他多模态假新闻检测模型进行了比较。主要的比较模型如下:MMCN [11]:多层次多模态交叉注意网络,利用网络在不同模态之间进行交叉注意。att-RNN [26]:使用注意力机制来联合收割机文本、视觉和社会背景特征。该模型利用LSTM网络联合表示文本和社会环境信息,然后利用注意力机制整合视觉特征。MAVE [8]:通过训练多模式变分自动编码器,并从学习到的共享表示中重构两个模式,找出模式之间的相关性,可以得到更好的多模式共享表示,用于虚假新闻检测。欧洲神经网络[31]:利用CNN和VGG 19进行多模态特征提取,将两者级联得到的多模态特征输入到假新闻检测器和事件鉴别器中,对每个帖子的标签进行鉴别。Bi-GCN [32]:双向图卷积网络是一种综合利用帖子内容和传播路径的虚假新闻检测方法,SeRN [33]:Stance Extraction and Reasoning Network提出了一种基于双向图卷积网络的帖子-回复对的立场提取与推理方法,用于隐式提取帖子-回复对中隐含的立场,并整合立场表示进行虚假新闻检测。

9.RESULTS AND DISCUSSION(结果和讨论)

Table 5 and Figure 10 show that the outcomes were close to those of the other baseline approaches in terms of accuracy on Weibo. While Table 6 and Figure 11 show our approach outperforms all the baselines on Fakeddit in terms ofaccuracy and F1 score, this demonstrates that deep neural networks are capable of learning better whenever the database increases, enhancing the accuracy of the outcomes. In addition to the role of reducing the dimensionality of fusion features via the auto-encoder, managing the amount of these features in turn contributes to enhancing detection accuracy. Furthermore, the success of our approach underscores the importance of continually refining and optimizing feature extraction and dimensionality reduction methods. By efficiently managing the quantity and quality of features, we can further enhance the accuracy and robustness of deception detection models, ensuring their effectiveness across diverse datasets and realworld applications.表5和图10显示,在微博的准确性方面,结果接近于其他基线方法。虽然表6和图11显示,我们的方法在精度和F1得分方面优于Fakeddit上的所有基线,但这表明,只要数据库增加,深度神经网络就能够更好地学习,从而提高结果的准确性。除了通过自动编码器降低融合特征的维数之外,管理这些特征的数量反过来有助于提高检测精度。此外,我们的方法的成功强调了不断改进和优化特征提取和降维方法的重要性。通过有效地管理特征的数量和质量,我们可以进一步提高欺骗检测模型的准确性和鲁棒性,确保其在不同数据集和现实应用中的有效性。


10.CONCLUSION AND FUTURE WORK(结论和今后工作)

This research paper proposes a fake news detection model based on multimodal deep learning to solve the problem of fake news detection in complex scenes where text and image coexist. The overall model consists of four parts, namely a feature extraction module, a feature fusion module, a dimensionality reduction stage, and a classification stage. The feature extraction module includes a text feature extractor and a visual feature extractor. The features of text and image are fused and then encoded via an auto-encoder, followed by classifying the encoded features by FLN. A large number of experiments with data collected on Weibo and Fakeddit demonstrate the effectiveness of the model proposed in this paper. Moving forward, there are several areas to explore and enhance in our fake news detection model. First off, we need to broaden our dataset, including a wider range of sources and social media platforms. This will make the model more adaptable to different scenarios. We also need to make the model more dynamic, allowing it to quickly adapt to new fake news tactics as they emerge. This means incorporating continuous learning mechanisms to keep the model updated in real-time. Optimizing for real-time processing is essential for integrating the model into live social media feeds and news streams. Considering the global nature of online content, it’s crucial to extend the model’s capabilities to detect fake news in multiple languages. This will make it more useful and applicable in diverse linguistic settings. Additionally, we should look into fine-tuning options, making the model easily customizable for specific domains or user preferences. This could involve creating user-friendly interfaces or providing settings for users to tailor the model to their needs. Lastly, user feedback is invaluable.We should create mechanisms for users to report potential fake news and integrate feedback loops to refine the model continuously. This collaborative effort between the model and its users ensures ongoing improvement.针对文本和图像共存的复杂场景下的假新闻检测问题,提出了一种基于多模态深度学习的假新闻检测模型。该模型由特征提取模块、特征融合模块、降维阶段和分类阶段四部分组成。特征提取模块包括文本特征提取器和视觉特征提取器。该方法首先对文本和图像的特征进行融合,然后通过自动编码器对融合后的特征进行编码,最后通过模糊神经网络对编码后的特征进行分类。在微博和Fakeddit上的大量实验结果显示了该模型的优点.展望未来,我们的假新闻检测模型有几个方面需要探索和改进。首先,我们需要扩大我们的数据集,包括更广泛的来源和社交媒体平台。这将使模型更能适应不同的场景。我们还需要让这个模式更具活力,让它能够在新的假新闻战术出现时迅速适应。这意味着要结合持续学习机制,以保持模型实时更新。优化实时处理对于将模型集成到实时社交媒体提要和新闻流中至关重要。考虑到在线内容的全球性,扩展该模型的功能以检测多种语言的假新闻至关重要。这将使它在不同的语言环境中更加有用和适用。此外,我们还应该研究微调选项,使模型能够针对特定的域或用户首选项轻松地进行定制。这可能涉及创建用户友好的界面或为用户提供设置,以根据其需要定制模型。最后,用户反馈是非常宝贵的,我们应该建立用户举报潜在假新闻的机制,整合反馈回路,不断完善模型。模型与其用户之间的这种协作努力确保了持续的改进。

本文标签: DetectionDeepneuralElevatingFake