R 和 Python 的输出值不同?

编程入门行业动态更新时间:2024-10-26 06:33:01

本文介绍了R 和 Python 的输出值不同?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

也许我在 z 规范化我的数组时做错了什么.有人可以看看这个并提出建议吗?

在 R 中:

>数据 <- c(2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.45, 1.34)>data.mean <- 平均值(数据)>data.sd <- sqrt(var(data))>data.norm <- (data - data.mean)/data.sd>打印(数据.范数)[1] -0.9796808 -0.8622706 -0.6123005 0.8496459 1.7396910 1.5881940 1.0958286 0.5277147 0.4709036 -85988[11] 0.0921607 -0.2865819 -0.9039323 -1.1955641 -1.2372258

在 Python 中使用 numpy:

>>>导入字符串>>>将 numpy 导入为 np>>>从 scipy.stats 导入规范>>>数据 = np.array([np.array([2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.34]) 1)>>>数据 -= np.split(np.mean(data,axis=1), data.shape[0])>>>数据 *= np.split(1.0/data.std(axis=1), data.shape[0])>>>打印数据[[-1.01406602 -0.89253491 -0.63379126 0.87946705 1.80075126 1.643936921.13429034 0.54623659 0.48743122 -0.29664045 0.09539539 -0.29664045-0.93565885 -1.23752644 -1.28065039]]

我是否错误地使用了 numpy?

解决方案

我相信你的 NumPy 结果是正确的.不过，我会以更简单的方式进行标准化:

>>>数据 = np.array([2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.45, 1.34])>>>数据 -= data.mean()>>>数据/= 数据.std()>>>数据数组([-1.01406602，-0.89253491，-0.63379126，0.87946705，1.80075126，1.64393692、1.13429034、0.54623659、0.48743122、-0.29664045、0.09539539, -0.29664045, -0.93565885, -1.23752644, -1.28065039])

你的两个结果之间的区别在于标准化:以 r 作为 R 结果:

>>>r/数据数组([ 0.96609173, 0.96609173, 0.96609173, 0.96609179, 0.96609179, 0.96609181, 0.9660918, 0.9660918180.96609179, 0.96609179, 0.9660918, 0.96609179, 0.96609175, 0.96609176, 0.96609177])

因此，您的两个结果大多只是相互成正比.因此，您可能想要比较使用 R 和 Python 获得的标准偏差.

PS:现在我想到了，可能是 NumPy 和 R 中的差异不是以相同的方式定义的:对于 N 元素，一些工具在计算方差时使用 N-1 而不是 N 进行标准化.您可能想检查一下.

PPS:差异的原因:因子的差异来自两种不同的归一化约定:观察到的因子只是 sqrt(14/15) =0.9660917…(因为数据有 15 个元素).因此，为了在 R 中获得与在 Python 中相同的结果，您需要将 R 结果除以这个因子.

Perhaps I am doing something wrong while z-normalizing my array. Can someone take a look at this and suggest what's going on?

In R:

> data <- c(2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.45, 1.34) > data.mean <- mean(data) > data.sd <- sqrt(var(data)) > data.norm <- (data - data.mean) / data.sd > print(data.norm) [1] -0.9796808 -0.8622706 -0.6123005 0.8496459 1.7396910 1.5881940 1.0958286 0.5277147 0.4709033 -0.2865819 [11] 0.0921607 -0.2865819 -0.9039323 -1.1955641 -1.2372258

In Python using numpy:

>>> import string >>> import numpy as np >>> from scipy.stats import norm >>> data = np.array([np.array([2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.45, 1.34])]) >>> data -= np.split(np.mean(data, axis=1), data.shape[0]) >>> data *= np.split(1.0/data.std(axis=1), data.shape[0]) >>> print data [[-1.01406602 -0.89253491 -0.63379126 0.87946705 1.80075126 1.64393692 1.13429034 0.54623659 0.48743122 -0.29664045 0.09539539 -0.29664045 -0.93565885 -1.23752644 -1.28065039]]

Am I using numpy incorrectly?

解决方案

I believe that your NumPy result is correct. I would do the normalization in a simpler way, though:

>>> data = np.array([2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.45, 1.34]) >>> data -= data.mean() >>> data /= data.std() >>> data array([-1.01406602, -0.89253491, -0.63379126, 0.87946705, 1.80075126, 1.64393692, 1.13429034, 0.54623659, 0.48743122, -0.29664045, 0.09539539, -0.29664045, -0.93565885, -1.23752644, -1.28065039])

The difference between your two results lies in the normalization: with r as the R result:

>>> r / data array([ 0.96609173, 0.96609173, 0.96609173, 0.96609179, 0.96609179, 0.96609181, 0.9660918 , 0.96609181, 0.96609179, 0.96609179, 0.9660918 , 0.96609179, 0.96609175, 0.96609176, 0.96609177])

Thus, your two results are mostly simply proportional to each other. You may therefore want to compare the standard deviations obtained with R and with Python.

PS: Now that I am thinking of it, it may be that the variance in NumPy and in R is not defined in the same way: for N elements, some tools normalize with N-1 instead of N, when calculating the variance. You may want to check this.

PPS: Here is the reason for the discrepancy: the difference in factors comes from two different normalization conventions: the observed factor is simply sqrt(14/15) = 0.9660917… (because the data has 15 elements). Thus, in order to obtain in R the same result as in Python, you need to divide the R result by this factor.