我们可以使用Base R来查找曲线下95%的面积吗?

编程入门 行业动态 更新时间:2024-10-09 23:12:41
本文介绍了我们可以使用Base R来查找曲线下95%的面积吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

使用Base R,我想知道是否可以确定曲线下面的 posterior 下的曲线下面的区域?

更具体地说,我想从模式(绿色虚线)朝尾部移动,然后在覆盖95%的曲线区域时停止。所需的是这个95%区域限制的x轴值,如下图所示?

prior =函数(x)dbeta(x,15.566,7.051)似然函数(x​​)dbinom(55,100,x) posterior =函数(x)先验(x)*可能性(x) $ b curve(posterior,n = 1, 1e4)

PS 换句话说,如果这样时间间隔可能是最短的95%的时间间隔。

解决方案 对称分布

尽管OP的例子并不完全对称,但它足够接近 - 简单得多。

你可以使用一个集成和优化的组合。我将它作为自定义函数编写,但请注意,如果您在其他情况下使用该函数,则可能需要重新考虑搜索分位数的界限。

#对于具有单峰的分布,找到对称! #包含probs概率的间隔。搜索'范围'。 f_quan< - 函数(fun,probs,range = c(0,1)){ mode < - optimize(fun,interval = range,maximum = TRUE,tol = 1e-12)[[1]] total_area< - 整合(乐趣,范围[1],范围[2])[[1]] O< ; - 函数(d){ parea< - integration(fun,mode-d,mode + d)[[1]] / total_area (probs - parea)^ 2 } #搜索界限可能需要根据问题进行一些调整! $ b优化(0,c(0,范围[2] / 2 - 1E-02))[[1]] return(c(mode -o,mode + o))}

像这样使用它,

f < - f_quan(posterior,0.95) curve(posterior,n = 1e4) abline(v = f,col = blue,lwd = 2,lty = 3)

给出

非对称分配

在非对称分布的情况下,我们必须搜索符合P(a

解决方案中重要的是定义域,这是我们想要搜索的区域(我们无法使用 -Inf,Inf ,因此用户必须将其设置为合理的值)

#考虑x轴上的区间(a,b)#将我们的函数归一化为总区域,得到#得到区间内的总概率 prob_ab< ; - 函数(fun,a,b,domain){ totarea< - integrate(fun,domain [1],domain [2])[[1]] integrate(fun,a, b)[[1]] / totarea } #现在给定a和概率,反向查找b invert_prob_ab < - 函数(fun,a,prob, (b,fun,a,prob){(prob_ab(fun,a,b,domain = domain) - prob)^ 2 $ bb b优化(O,c(a,domain [2]),a = a,fun = fun,prob = prob)$ minimum 返回(b)} #现在通过变化找到最短间隔。简化:不要搜索过去的模式,否则在域的右边接近#会带来严重的麻烦! prob_int_shortest< - 函数(fun,prob,domain){ mode< - optimize(fun,interval = domain,maximum = TRUE,tol = 1e-12)[[1 ]] #要被最小化的目标函数:间隔的宽度 O< - 函数(a,fun,prob,domain){b< - invert_prob_ab a,prob,domain) b - a } 符合准则的最短间隔 abest < - 优化(O,c (0,mode),fun = fun,prob = prob,domain = domain)$ minimum #现在返回间隔b< - invert_prob_ab(fun,abest,prob,domain) return(c(abest,b))}

现在使用上面的代码。我使用非常不对称的函数(假设mydist实际上是一些复杂的pdf,而不是dgamma)。

mydist< - 函数(x)dgamma(x,shape = 2)曲线(mydist(x),from = 0,to = 10) abline(v = prob_int_shortest(mydist,0.9,c(0,10) ),lty = 3,col =blue,lwd = 2)

域(0,10),因为显然间隔必须在某处。请注意,使用像(0,1E05)这样的非常大的值不起作用,因为 integrate 在接近零的长序列中遇到问题。同样,对于你的情况,你将不得不调整域名(除非有人有更好的主意!)。

Using Base R, I was wondering if I could determine the 95% area under the curve denoted as posterior below?

More specifically, I want to move from the mode (the green dashed line) toward the tails and then stop when I have covered 95% of the curve area. Desired are the x-axis values that are the limits of this 95% area as shown in the picture below?

prior = function(x) dbeta(x, 15.566, 7.051) likelihood = function(x) dbinom(55, 100, x) posterior = function(x) prior(x)*likelihood(x) mode = optimize(posterior, interval = c(0, 1), maximum = TRUE, tol = 1e-12)[[1]] curve(posterior, n = 1e4)

P.S In other words, it is highly desirable if such an Interval be the shortest 95% interval possible.

解决方案

Symmetric distribution

Even though OP's example was not exactly symmetric, it is close enough - and useful to start there since the solution is much simpler.

You can use a combination of integrate and optimize. I wrote this as a custom function, but note that if you use this in other situations you may have to rethink the bounds for searching the quantile.

# For a distribution with a single peak, find the symmetric! # interval that contains probs probability. Search over 'range'. f_quan <- function(fun, probs, range=c(0,1)){ mode <- optimize(fun, interval = range, maximum = TRUE, tol = 1e-12)[[1]] total_area <- integrate(fun, range[1], range[2])[[1]] O <- function(d){ parea <- integrate(fun, mode-d, mode+d)[[1]] / total_area (probs - parea)^2 } # Bounds for searching may need some adjustment depending on the problem! o <- optimize(O, c(0,range[2]/2 - 1E-02))[[1]] return(c(mode-o, mode+o)) }

Use it like this,

f <- f_quan(posterior, 0.95) curve(posterior, n = 1e4) abline(v=f, col="blue", lwd=2, lty=3)

gives

Asymmetric distribution

In the case of an asymmetric distribution, we have to search two points that meet the criterium that P(a < x < b) = Prob, where Prob is some desired probability. Since there are infinitely many intervals (a,b) that meet this, OP suggested finding the shortest one.

Important in the solution is the definition of a domain, the region where we want to search (we cannot use -Inf, Inf, so the user has to set this to reasonable values).

# consider interval (a,b) on the x-axis # integrate our function, normalize to total area, to # get the total probability in the interval prob_ab <- function(fun, a, b, domain){ totarea <- integrate(fun, domain[1], domain[2])[[1]] integrate(fun, a, b)[[1]] / totarea } # now given a and the probability, invert to find b invert_prob_ab <- function(fun, a, prob, domain){ O <- function(b, fun, a, prob){ (prob_ab(fun, a, b, domain=domain) - prob)^2 } b <- optimize(O, c(a, domain[2]), a = a, fun=fun, prob=prob)$minimum return(b) } # now find the shortest interval by varying a # Simplification: don't search past the mode, otherwise getting close # to the right-hand side of domain will give serious trouble! prob_int_shortest <- function(fun, prob, domain){ mode <- optimize(fun, interval = domain, maximum = TRUE, tol = 1e-12)[[1]] # objective function to be minimized: the width of the interval O <- function(a, fun, prob, domain){ b <- invert_prob_ab(fun, a, prob, domain) b - a } # shortest interval that meets criterium abest <- optimize(O, c(0,mode), fun=fun, prob=prob, domain=domain)$minimum # now return the interval b <- invert_prob_ab(fun, abest, prob, domain) return(c(abest,b)) }

Now use the above code like this. I use a very asymmetric function (just assume mydist is actually some complicated pdf, not the dgamma).

mydist <- function(x)dgamma(x, shape=2) curve(mydist(x), from=0, to=10) abline(v=prob_int_shortest(mydist, 0.9, c(0,10)), lty=3, col="blue", lwd=2)

In this example I set domain to (0,10), since clearly the interval must be in there somewhere. Note that using a very large value like (0, 1E05) does not work, because integrate has trouble with long sequences of near-zeroes. Again, for your situation, you will have to adjust the domain (unless someone has a better idea!).

更多推荐

我们可以使用Base R来查找曲线下95%的面积吗?

本文发布于:2023-10-26 18:05:32,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:可以使用   曲线   面积   Base

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!