这可能是一个天真的问题,但我找不到任何关于它的帖子,所以我认为这可能有用。 我找到了一个可以很好地适应我的数据的发行版,但我的所有数据点在现实生活中都是积极的( - 不可能)。
有没有办法强制.rvs只输出正值?
我想到了一些方法,但它们似乎非常np.random.choice CPU,就像制作比我需要的更多的值,然后对所有正值和np.random.choice的值做一个布尔掩码。 有没有更好的办法?
我在文档中没有看到任何关于它的内容:/ about about: http : //docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html
我找到这个的短语没有产生任何结果: https : //stackoverflow.com/search?q = force + scipy + rvs + positive和https://stackoverflow.com/search?q=scipy+rvs+positive
params = (0.00169906712999, 0.00191866845411) np.random.seed(0) stats.norm.rvs(*params, size=10) array([ 0.0050837 , 0.00246684, 0.00357694, 0.0059986 , 0.00528229, -0.00017601, 0.00352197, 0.00140866, 0.00150102, 0.00248687])This may be a naive question but I couldn't find any posts about it so I thought it may be useful to ask. I found a distribution that may fit my data well but all of my data points are positive in real life (- ones are impossible).
Is there a way to force .rvs to output only positive values?
I thought of some ways but they seem pretty CPU intensive like making way more values than I would need and then doing a boolean mask for all the values that are positive and np.random.choice from those. Is there a better way?
I didn't see anything about it in the docs :/ about this: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html
My phrases to find this didn't yield any results: https://stackoverflow.com/search?q=force+scipy+rvs+positive and https://stackoverflow.com/search?q=scipy+rvs+positive
params = (0.00169906712999, 0.00191866845411) np.random.seed(0) stats.norm.rvs(*params, size=10) array([ 0.0050837 , 0.00246684, 0.00357694, 0.0059986 , 0.00528229, -0.00017601, 0.00352197, 0.00140866, 0.00150102, 0.00248687])最满意答案
您似乎在寻找truncnorm :截断的正常连续随机变量。
例如,尝试:
>>> from scipy import stats >>> import numpy as np >>> np.random.seed(0) >>> params = (0.00169906712999, 0.00191866845411) >>> params[0] + stats.truncnorm.rvs(-params[0]/params[1], np.infty, size=10, scale=params[1]) array([ 0.00235414, 0.00310856, 0.00258259, 0.00233789, 0.00185298, 0.00277454, 0.00190764, 0.00429671, 0.00532165, 0.00169576])stats.truncnorm.rvs的前两个参数是截断限制。 因为这些是针对正态分布计算的(均值= 0 std dev = 1),所以我们必须适当地缩放参数。
我们使用np.infty作为范围的上限,因为我们不希望在上侧有任何截断。
验证输出都不是负数
让我们看看超过100,000个样本的输出的最小值和最大值:
>>> np.random.seed(0) >>> np.min(params[0] + stats.truncnorm.rvs(-params[0]/params[1], np.infty, size=100000, scale=params[1])) 1.9136656654716172e-08 >>> np.max(params[0] + stats.truncnorm.rvs(-params[0]/params[1], np.infty, size=10000, scale=params[1])) 0.0088294835649150548如你所见,最小值永远不会消极。 最大值是高于平均值的几个std dev。
You appear to be looking for truncnorm: a truncated normal continuous random variable.
For example, try:
>>> from scipy import stats >>> import numpy as np >>> np.random.seed(0) >>> params = (0.00169906712999, 0.00191866845411) >>> params[0] + stats.truncnorm.rvs(-params[0]/params[1], np.infty, size=10, scale=params[1]) array([ 0.00235414, 0.00310856, 0.00258259, 0.00233789, 0.00185298, 0.00277454, 0.00190764, 0.00429671, 0.00532165, 0.00169576])The first two arguments to stats.truncnorm.rvs are the truncation limits. Because these are computed for the normal distribution (mean=0 std dev=1), we have to scale the parameters appropriately.
We use np.infty for the upper limit on the range because we don't want any truncation on the upper side.
Verifying that none of the output is negative
Let's look at the minimum and maximum of the output over 100,000 samples:
>>> np.random.seed(0) >>> np.min(params[0] + stats.truncnorm.rvs(-params[0]/params[1], np.infty, size=100000, scale=params[1])) 1.9136656654716172e-08 >>> np.max(params[0] + stats.truncnorm.rvs(-params[0]/params[1], np.infty, size=10000, scale=params[1])) 0.0088294835649150548As you can see, the minimum never goes negative. The maximum is a few std dev above the mean.
更多推荐
发布评论