本文介绍了我们
可以使用Base R来查找曲线下95%的面积吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
使用Base R,我想知道是否可以确定曲线下面的 posterior 下的曲线下面的区域?
更具体地说,我想从模式(绿色虚线)朝尾部移动,然后在覆盖95%的曲线区域时停止。所需的是这个95%区域限制的x轴值,如下图所示?
prior =函数(x)dbeta(x,15.566,7.051)似然函数(x)dbinom(55,100,x) posterior =函数(x)先验(x)*可能性(x) $ b curve(posterior,n = 1, 1e4)
PS 换句话说,如果这样时间间隔可能是最短的95%的时间间隔。
解决方案 对称分布
尽管OP的例子并不完全对称,但它足够接近 - 简单得多。
你可以使用一个集成和优化的组合。我将它作为自定义函数编写,但请注意,如果您在其他情况下使用该函数,则可能需要重新考虑搜索分位数的界限。
#对于具有单峰的分布,找到对称! #包含probs概率的间隔。搜索'范围'。 f_quan< - 函数(fun,probs,range = c(0,1)){ mode < - optimize(fun,interval = range,maximum = TRUE,tol = 1e-12)[[1]] total_area< - 整合(乐趣,范围[1],范围[2])[[1]] O< ; - 函数(d){ parea< - integration(fun,mode-d,mode + d)[[1]] / total_area (probs - parea)^ 2 } #搜索界限可能需要根据问题进行一些调整! $ b优化(0,c(0,范围[2] / 2 - 1E-02))[[1]] return(c(mode -o,mode + o))}
像这样使用它,
f < - f_quan(posterior,0.95) curve(posterior,n = 1e4) abline(v = f,col = blue,lwd = 2,lty = 3)
给出
非对称分配
在非对称分布的情况下,我们必须搜索符合P(a 解决方案中重要的是定义域,这是我们想要搜索的区域(我们无法使用 -Inf,Inf ,因此用户必须将其设置为合理的值)
#考虑x轴上的区间(a,b)#将我们的函数归一化为总区域,得到#得到区间内的总概率 prob_ab< ; - 函数(fun,a,b,domain){ totarea< - integrate(fun,domain [1],domain [2])[[1]] integrate(fun,a, b)[[1]] / totarea } #现在给定a和概率,反向查找b invert_prob_ab < - 函数(fun,a,prob, (b,fun,a,prob){(prob_ab(fun,a,b,domain = domain) - prob)^ 2 $ bb b优化(O,c(a,domain [2]),a = a,fun = fun,prob = prob)$ minimum 返回(b)} #现在通过变化找到最短间隔。简化:不要搜索过去的模式,否则在域的右边接近#会带来严重的麻烦! prob_int_shortest< - 函数(fun,prob,domain){ mode< - optimize(fun,interval = domain,maximum = TRUE,tol = 1e-12)[[1 ]] #要被最小化的目标函数:间隔的宽度 O< - 函数(a,fun,prob,domain){b< - invert_prob_ab a,prob,domain) b - a } 符合准则的最短间隔 abest < - 优化(O,c (0,mode),fun = fun,prob = prob,domain = domain)$ minimum #现在返回间隔b< - invert_prob_ab(fun,abest,prob,domain) return(c(abest,b))} 现在使用上面的代码。我使用非常不对称的函数(假设mydist实际上是一些复杂的pdf,而不是dgamma)。
mydist< - 函数(x)dgamma(x,shape = 2)曲线(mydist(x),from = 0,to = 10) abline(v = prob_int_shortest(mydist,0.9,c(0,10) ),lty = 3,col =blue,lwd = 2) 域(0,10),因为显然间隔必须在某处。请注意,使用像(0,1E05)这样的非常大的值不起作用,因为 integrate 在接近零的长序列中遇到问题。同样,对于你的情况,你将不得不调整域名(除非有人有更好的主意!)。
Using Base R, I was wondering if I could determine the 95% area under the curve denoted as posterior below?
More specifically, I want to move from the mode (the green dashed line) toward the tails and then stop when I have covered 95% of the curve area. Desired are the x-axis values that are the limits of this 95% area as shown in the picture below?
prior = function(x) dbeta(x, 15.566, 7.051)
likelihood = function(x) dbinom(55, 100, x)
posterior = function(x) prior(x)*likelihood(x)
mode = optimize(posterior, interval = c(0, 1), maximum = TRUE, tol = 1e-12)[[1]]
curve(posterior, n = 1e4)
P.S In other words, it is highly desirable if such an Interval be the shortest 95% interval possible.
解决方案 Symmetric distribution
Even though OP's example was not exactly symmetric, it is close enough - and useful to start there since the solution is much simpler.
You can use a combination of integrate and optimize. I wrote this as a custom function, but note that if you use this in other situations you may have to rethink the bounds for searching the quantile.
# For a distribution with a single peak, find the symmetric!
# interval that contains probs probability. Search over 'range'.
f_quan <- function(fun, probs, range=c(0,1)){
mode <- optimize(fun, interval = range, maximum = TRUE, tol = 1e-12)[[1]]
total_area <- integrate(fun, range[1], range[2])[[1]]
O <- function(d){
parea <- integrate(fun, mode-d, mode+d)[[1]] / total_area
(probs - parea)^2
}
# Bounds for searching may need some adjustment depending on the problem!
o <- optimize(O, c(0,range[2]/2 - 1E-02))[[1]]
return(c(mode-o, mode+o))
}
Use it like this,
f <- f_quan(posterior, 0.95)
curve(posterior, n = 1e4)
abline(v=f, col="blue", lwd=2, lty=3)
gives
Asymmetric distribution
In the case of an asymmetric distribution, we have to search two points that meet the criterium that P(a < x < b) = Prob, where Prob is some desired probability. Since there are infinitely many intervals (a,b) that meet this, OP suggested finding the shortest one.
Important in the solution is the definition of a domain, the region where we want to search (we cannot use -Inf, Inf, so the user has to set this to reasonable values).
# consider interval (a,b) on the x-axis
# integrate our function, normalize to total area, to
# get the total probability in the interval
prob_ab <- function(fun, a, b, domain){
totarea <- integrate(fun, domain[1], domain[2])[[1]]
integrate(fun, a, b)[[1]] / totarea
}
# now given a and the probability, invert to find b
invert_prob_ab <- function(fun, a, prob, domain){
O <- function(b, fun, a, prob){
(prob_ab(fun, a, b, domain=domain) - prob)^2
}
b <- optimize(O, c(a, domain[2]), a = a, fun=fun, prob=prob)$minimum
return(b)
}
# now find the shortest interval by varying a
# Simplification: don't search past the mode, otherwise getting close
# to the right-hand side of domain will give serious trouble!
prob_int_shortest <- function(fun, prob, domain){
mode <- optimize(fun, interval = domain, maximum = TRUE, tol = 1e-12)[[1]]
# objective function to be minimized: the width of the interval
O <- function(a, fun, prob, domain){
b <- invert_prob_ab(fun, a, prob, domain)
b - a
}
# shortest interval that meets criterium
abest <- optimize(O, c(0,mode), fun=fun, prob=prob, domain=domain)$minimum
# now return the interval
b <- invert_prob_ab(fun, abest, prob, domain)
return(c(abest,b))
}
Now use the above code like this. I use a very asymmetric function (just assume mydist is actually some complicated pdf, not the dgamma).
mydist <- function(x)dgamma(x, shape=2)
curve(mydist(x), from=0, to=10)
abline(v=prob_int_shortest(mydist, 0.9, c(0,10)), lty=3, col="blue", lwd=2)
In this example I set domain to (0,10), since clearly the interval must be in there somewhere. Note that using a very large value like (0, 1E05) does not work, because integrate has trouble with long sequences of near-zeroes. Again, for your situation, you will have to adjust the domain (unless someone has a better idea!).
发布评论