admin管理员组文章数量:1630201
异常行为检测算法
Anomaly detection is a critical problem that has been researched within diverse research areas and application disciplines. This article aims to construct a structured and comprehensive overview of the selected algorithms for anomaly detection by targeting data scientists, data analysts, and machine learning specialists as an audience.
异常检测是已在各种研究领域和应用学科中研究的一个关键问题。 本文旨在通过将数据科学家,数据分析师和机器学习专家作为受众,针对所选的异常检测算法构建结构全面的概述。
异常检测的概念 (Concept of Anomaly Detection)
An unexpected change that performs highly divergent attitudes from other observations in a time period can be represented as abnormal behavior. In other words, Anomaly Detection can be defined as the measure of specifying the outliers in the existing dataset which acts considerably different from the rest of the data points by profiling them as non-conforming normal points.
在一段时间内与其他观察结果表现出高度分歧的意外变化可以表示为异常行为。 换句话说,异常检测 可以定义为在现有数据集中指定离群值的度量,该离群通过将它们配置为不合格的法线点而与其余数据点有很大不同。
Anomalous points might be produced by errors in the data; however, it could point out to a historically or currently existing unidentified or hidden process or behavior by Hawkins.
异常点可能是由数据错误产生的; 但是,它可能指出了Hawkins在历史上或当前存在的未识别或隐藏的过程或行为。
As the publicly available data volume reaches in mass amounts, outlier detecting algorithms are modified to run on these data sets to be able to predict the unusual patterns. For instance, a “suspiciously high” count of login trials might outline a possible cyber intrusion or a considerable increase in incoming network traffic can be pointed to malicious activity in network systems. Considering these activities, they hold a shared aspect that they are “appealing” and “unusual” to the data scientists and data analysts. The “curiosity” or real-life applicability of anomalies is an essential element of anomaly detection.
随着公开可用数据量的大量增加,离群值检测算法已修改为在这些数据集上运行,以便能够预测异常模式。 例如,登录试验的“可疑数量”可能概述了可能的网络入侵,或者传入网络流量的显着增加可能表明网络系统中存在恶意活动。 考虑到这些活动,它们具有一个共同的方面,即它们对数据科学家和数据分析师“具有吸引力”和“不同寻常” 。 异常的“好奇心”或现实适用性是异常检测的基本要素。
异常类型 (Types of Anomalies)
There exist three different kinds of anomalies in the literature.
文献中存在三种不同类型的异常。
Descriptions can be found below:
可以在下面找到说明:
1. Point Anomaly: An anomaly when a distinct item in a dataset is largely dissimilar from others corresponding to its attributes.
1.点异常:数据集中的不同项目与对应于其属性的其他项目在很大程度上不同时的异常。
2. Contextual Anomalies: An anomaly which has a divergence that points to a context-based knowledge. This kind of anomaly may not be recognized when the contextual information is absent.
2.上下文异常:具有差异的异常,该异常指向基于上下文的知识。 当缺少上下文信息时,可能无法识别这种异常。
3. Collective Anomalies: Anomalies that are composed of multiple related instances of elements that may not constitute an anomalous point individually. The collective summation of specific events is considered while analyzing outlier behaviors.
3.集体异常:由元素的多个相关实例组成的异常,这些元素可能不会单独构成异常点。 在分析异常行为时,应考虑特定事件的集体汇总。
目录 (Table of Contents)
1. Statistical Approach1.1. Minimum Covariance Determinant (MCD)1.2. Principle Component Analysis (PCA)
1.统计方法1.1。 最小协方差决定因素(MCD) 1.2。 主成分分析(PCA)
2. Distance-based Approach
2.1. Local Outlier Factor (LOF)
2.2. Novelty Detection Local Outlier Factor (ND LOF)
2.3. Mahalanobis Distance (MDist)2.基于距离的方法2.1。 局部离群因子(LOF) 2.2。 新奇检测局部离群因子(ND LOF) 2.3。 马氏距离(MDist)
3. Density-based Approach
3.1. Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
3.2. Ordering Points To Identify the Clustering Structure (OPTICS)3.基于密度的方法3.1。 基于密度的噪声应用空间聚类(DBSCAN) 3.2。 识别聚类结构的排序点(OPTICS)
4. Isolation-based Approach
4.1. Isolation Forest (iForest)4.基于隔离的方法4.1。 隔离林(iForest)
5. Classification-based Approach
5.1. One-Class SVM5.基于分类的方法5.1。 一类SVM
1.统计方法 (1. STATISTICAL APPROACH)
1.1。 最小协方差行列式(MCD) (1.1. Minimum Covariance Determinant (MCD))
Minimum Covariance Determinant (MCD) acts as the covariance estimator that is to be applied to Gaussian-distributed data. It basically searches for the subset of a specified number of data points whose covariance matrix contains the lowest determinant.
最小协方差行列式(MCD) 用作将应用于高斯分布数据的协方差估计器。 它基本上搜索指定数量的数据点的子集,这些数据点的协方差矩阵包含最低的行列式。
Because of the geometrical representation of the covariance matrix, the MCD algorithm tends to learn a rotationally symmetrical shape and works best with elliptically symmetric unimodal distributions. For this reason, it would be more performant to apply this algorithm while detecting outliers on the dataset which belongs to a unimodal distribution, so it is not advised to be used with multi-modal data. The more the size of the data and unimodality gets lower, the more the performance of the algorithm diminishes.
由于协方差矩阵的几何表示,MCD算法倾向于学习旋转对称的形状,并且最适合椭圆对称的单峰分布。 因此,在检测属于单峰分布的数据集上的离群值时,应用该算法会更有性能,因此不建议与多峰数据一起使用。 数据的大小和单峰性越小,算法的性能下降的幅度就越大。
For the formulation and the detailed parameter explanations, please kindly visit this article.
有关配方和详细的参数说明,请访问 这篇文章。
版权声明:本文标题:异常行为检测算法_检测异常行为的异常或异常类型算法 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://www.elefans.com/xitong/1729057001a1184081.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论