我们可以使用Base R来查找曲线下95％的面积吗？

编程入门行业动态更新时间:2024-10-09 23:12:41

本文介绍了我们可以使用Base R来查找曲线下95％的面积吗？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

使用Base R，我想知道是否可以确定曲线下面的 posterior 下的曲线下面的区域？

更具体地说，我想从模式（绿色虚线）朝尾部移动，然后在覆盖95％的曲线区域时停止。所需的是这个95％区域限制的x轴值，如下图所示？

prior =函数（x）dbeta（x，15.566，7.051）似然函数（x）dbinom（55，100，x） posterior =函数（x）先验（x）*可能性（x） $ b curve（posterior，n = 1， 1e4）

PS 换句话说，如果这样时间间隔可能是最短的95％的时间间隔。

解决方案对称分布

尽管OP的例子并不完全对称，但它足够接近 - 简单得多。

你可以使用一个集成和优化的组合。我将它作为自定义函数编写，但请注意，如果您在其他情况下使用该函数，则可能需要重新考虑搜索分位数的界限。

＃对于具有单峰的分布，找到对称！＃包含probs概率的间隔。搜索'范围'。 f_quan< - 函数（fun，probs，range = c（0,1））{ mode < - optimize（fun，interval = range，maximum = TRUE，tol = 1e-12）[[1]] total_area< - 整合（乐趣，范围[1]，范围[2]）[[1]] O< ; - 函数（d）{ parea< - integration（fun，mode-d，mode + d）[[1]] / total_area （probs - parea）^ 2 } ＃搜索界限可能需要根据问题进行一些调整！ $ b优化（0，c（0，范围[2] / 2 - 1E-02））[[1]] return（c（mode -o，mode + o））}

像这样使用它，

f < - f_quan（posterior，0.95） curve（posterior，n = 1e4） abline（v = f，col = blue，lwd = 2，lty = 3）
给出
非对称分配
在非对称分布的情况下，我们必须搜索符合P（a
解决方案中重要的是定义域，这是我们想要搜索的区域（我们无法使用 -Inf，Inf ，因此用户必须将其设置为合理的值）
＃考虑x轴上的区间（a，b）＃将我们的函数归一化为总区域，得到＃得到区间内的总概率 prob_ab< ; - 函数（fun，a，b，domain）{ totarea< - integrate（fun，domain [1]，domain [2]）[[1]] integrate（fun，a， b）[[1]] / totarea } ＃现在给定a和概率，反向查找b invert_prob_ab < - 函数（fun，a，prob，（b，fun，a，prob）{（prob_ab（fun，a，b，domain = domain） - prob）^ 2 $ bb b优化（O，c（a，domain [2]），a = a，fun = fun，prob = prob）$ minimum 返回（b）} ＃现在通过变化找到最短间隔。简化：不要搜索过去的模式，否则在域的右边接近＃会带来严重的麻烦！ prob_int_shortest< - 函数（fun，prob，domain）{ mode< - optimize（fun，interval = domain，maximum = TRUE，tol = 1e-12）[[1 ]] ＃要被最小化的目标函数：间隔的宽度 O< - 函数（a，fun，prob，domain）{b< - invert_prob_ab a，prob，domain） b - a } 符合准则的最短间隔 abest < - 优化（O，c （0，mode），fun = fun，prob = prob，domain = domain）$ minimum ＃现在返回间隔b< - invert_prob_ab（fun，abest，prob，domain） return（c（abest，b））}
现在使用上面的代码。我使用非常不对称的函数（假设mydist实际上是一些复杂的pdf，而不是dgamma）。
mydist< - 函数（x）dgamma（x，shape = 2）曲线（mydist（x），from = 0，to = 10） abline（v = prob_int_shortest（mydist，0.9，c（0,10）），lty = 3，col =blue，lwd = 2）
域（0,10），因为显然间隔必须在某处。请注意，使用像（0，1E05）这样的非常大的值不起作用，因为 integrate 在接近零的长序列中遇到问题。同样，对于你的情况，你将不得不调整域名（除非有人有更好的主意！）。

Using Base R, I was wondering if I could determine the 95% area under the curve denoted as posterior below?

More specifically, I want to move from the mode (the green dashed line) toward the tails and then stop when I have covered 95% of the curve area. Desired are the x-axis values that are the limits of this 95% area as shown in the picture below?
prior = function(x) dbeta(x, 15.566, 7.051) likelihood = function(x) dbinom(55, 100, x) posterior = function(x) prior(x)*likelihood(x) mode = optimize(posterior, interval = c(0, 1), maximum = TRUE, tol = 1e-12)[[1]] curve(posterior, n = 1e4)
P.S In other words, it is highly desirable if such an Interval be the shortest 95% interval possible.

解决方案
Symmetric distribution
Even though OP's example was not exactly symmetric, it is close enough - and useful to start there since the solution is much simpler.

You can use a combination of integrate and optimize. I wrote this as a custom function, but note that if you use this in other situations you may have to rethink the bounds for searching the quantile.
# For a distribution with a single peak, find the symmetric! # interval that contains probs probability. Search over 'range'. f_quan <- function(fun, probs, range=c(0,1)){ mode <- optimize(fun, interval = range, maximum = TRUE, tol = 1e-12)[[1]] total_area <- integrate(fun, range[1], range[2])[[1]] O <- function(d){ parea <- integrate(fun, mode-d, mode+d)[[1]] / total_area (probs - parea)^2 } # Bounds for searching may need some adjustment depending on the problem! o <- optimize(O, c(0,range[2]/2 - 1E-02))[[1]] return(c(mode-o, mode+o)) }
Use it like this,
f <- f_quan(posterior, 0.95) curve(posterior, n = 1e4) abline(v=f, col="blue", lwd=2, lty=3)
gives

Asymmetric distribution
In the case of an asymmetric distribution, we have to search two points that meet the criterium that P(a < x < b) = Prob, where Prob is some desired probability. Since there are infinitely many intervals (a,b) that meet this, OP suggested finding the shortest one.

Important in the solution is the definition of a domain, the region where we want to search (we cannot use -Inf, Inf, so the user has to set this to reasonable values).
# consider interval (a,b) on the x-axis # integrate our function, normalize to total area, to # get the total probability in the interval prob_ab <- function(fun, a, b, domain){ totarea <- integrate(fun, domain[1], domain[2])[[1]] integrate(fun, a, b)[[1]] / totarea } # now given a and the probability, invert to find b invert_prob_ab <- function(fun, a, prob, domain){ O <- function(b, fun, a, prob){ (prob_ab(fun, a, b, domain=domain) - prob)^2 } b <- optimize(O, c(a, domain[2]), a = a, fun=fun, prob=prob)$minimum return(b) } # now find the shortest interval by varying a # Simplification: don't search past the mode, otherwise getting close # to the right-hand side of domain will give serious trouble! prob_int_shortest <- function(fun, prob, domain){ mode <- optimize(fun, interval = domain, maximum = TRUE, tol = 1e-12)[[1]] # objective function to be minimized: the width of the interval O <- function(a, fun, prob, domain){ b <- invert_prob_ab(fun, a, prob, domain) b - a } # shortest interval that meets criterium abest <- optimize(O, c(0,mode), fun=fun, prob=prob, domain=domain)$minimum # now return the interval b <- invert_prob_ab(fun, abest, prob, domain) return(c(abest,b)) }
Now use the above code like this. I use a very asymmetric function (just assume mydist is actually some complicated pdf, not the dgamma).
mydist <- function(x)dgamma(x, shape=2) curve(mydist(x), from=0, to=10) abline(v=prob_int_shortest(mydist, 0.9, c(0,10)), lty=3, col="blue", lwd=2)
In this example I set domain to (0,10), since clearly the interval must be in there somewhere. Note that using a very large value like (0, 1E05) does not work, because integrate has trouble with long sequences of near-zeroes. Again, for your situation, you will have to adjust the domain (unless someone has a better idea!).

更多推荐

我们可以使用Base R来查找曲线下95％的面积吗？

本文发布于:2023-10-26 18:05:32，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1530931.html

版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

可以使用曲线面积 Base

上一篇：在redshift postgresql中我可以使用复制功能跳过列吗

下一篇：在Windows窗体中可以使用WPF中设计的按钮吗?

发布评论取消回复

评论列表（有 0 条评论）

最近发表

荆门网站建设的重要性

win10蓝屏终止代码CRITICAL_PROCESS_DIED解决方法

您可以尝试添加 --skip-broken 选项来解决该问题您可以尝试执行：rpm -Va --nofiles --nodigest 解决方案

关于无线网络波动大的解决办法

Windows10 关于系统中断CPU占用过高导致电脑变卡的解决办法

VS 2019 点击页面自动定位到解决方案资源管理器目录位置

（亲测解决）VMware打开需要半天才进入、打开系统很慢、运行很慢解决办法

Typora官网下载的最新版本mac10.13以下版本用不了的解决办法

成功解决ModuleNotFoundError: No module named ‘torch._C‘

MySQL:由于找不到VCRUNTIME140_1.dll，无法继续执行代码。重新安装程序可能会解决此问题

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍！

热门文章

从源“http://localhost:5173”访问“...”处的 XMLHttpRequest 已被 CORS 策略阻止

币安API错误代码1102，未发送强制参数“时间戳”

如果我在bot telegram nodejs中使用editMessageMedia，我如何制作标题

在 Node.js 中从网络流创建 blob

使用 Node.js / ES6 如何设置 dotenv 文件的自定义路径？

使用 NODE.JS 和 html5 实现低延迟（50 毫秒）视频流

如何从nodejs连接laravel>laravel

使用nodejs观看目录

如果文件包含特定字符串，如何跳过 GitHub 工作流程步骤？

FirebaseError：无法从.env加载环境变量

标签列表

文件

如何在

Python

系统

java

方法

数据

错误

windows

函数

android

linux

教程

如何使用

代码

字符串

计算机

电脑

服务器

NET

应用程序

数组

PHP

MySQL

SQL

对象

项目

程序

数据库

word