matplotlib:绘图时忽略异常值

编程入门行业动态更新时间:2024-10-12 12:27:53

本文介绍了matplotlib:绘图时忽略异常值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在绘制来自各种测试的一些数据.有时在测试中，我碰巧有一个异常值(例如0.1)，而所有其他值都小三个数量级.

I'm plotting some data from various tests. Sometimes in a test I happen to have one outlier (say 0.1), while all other values are three orders of magnitude smaller.

使用matplotlib，我针对范围[0, max_data_value]

With matplotlib, I plot against the range [0, max_data_value]

如何仅放大数据而不显示异常值，否则异常值会弄乱我的绘图中的x轴?

How can I just zoom into my data and not display outliers, which would mess up the x-axis in my plot?

我是否应该简单地将95个百分位数设为x轴上的范围[0, 95_percentile]?

Should I simply take the 95 percentile and have the range [0, 95_percentile] on the x-axis?

推荐答案

离群值没有单一的最佳"测试.理想情况下，您应该合并先验信息(例如，因为等等，此参数不应超过x ...").

There's no single "best" test for an outlier. Ideally, you should incorporate a-priori information (e.g. "This parameter shouldn't be over x because of blah...").

大多数离群值测试使用的是绝对绝对值中位数，而不是第95个百分位数或其他一些基于差异的度量.否则，计算出的方差/stddev将被异常值严重偏斜.

Most tests for outliers use the median absolute deviation, rather than the 95th percentile or some other variance-based measurement. Otherwise, the variance/stddev that is calculated will be heavily skewed by the outliers.

这是一个实现更常见异常值测试的函数.

Here's a function that implements one of the more common outlier tests.

def is_outlier(points, thresh=3.5): """ Returns a boolean array with True if points are outliers and False otherwise. Parameters: ----------- points : An numobservations by numdimensions array of observations thresh : The modified z-score to use as a threshold. Observations with a modified z-score (based on the median absolute deviation) greater than this value will be classified as outliers. Returns: -------- mask : A numobservations-length boolean array. References: ---------- Boris Iglewicz and David Hoaglin (1993), "Volume 16: How to Detect and Handle Outliers", The ASQC Basic References in Quality Control: Statistical Techniques, Edward F. Mykytka, Ph.D., Editor. """ if len(points.shape) == 1: points = points[:,None] median = np.median(points, axis=0) diff = np.sum((points - median)**2, axis=-1) diff = np.sqrt(diff) med_abs_deviation = np.median(diff) modified_z_score = 0.6745 * diff / med_abs_deviation return modified_z_score > thresh

作为使用它的示例，您将执行以下操作:

As an example of using it, you'd do something like the following:

import numpy as np import matplotlib.pyplot as plt # The function above... In my case it's in a local utilities module from sci_utilities import is_outlier # Generate some data x = np.random.random(100) # Append a few "bad" points x = np.r_[x, -3, -10, 100] # Keep only the "good" points # "~" operates as a logical not operator on boolean numpy arrays filtered = x[~is_outlier(x)] # Plot the results fig, (ax1, ax2) = plt.subplots(nrows=2) ax1.hist(x) ax1.set_title('Original') ax2.hist(filtered) ax2.set_title('Without Outliers') plt.show()

更多推荐

matplotlib:绘图时忽略异常值

本文发布于:2023-07-18 15:43:53，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1146346.html