我对使用R的survival包执行简单预测的格式感到困惑
library(survival) lung.surv <- survfit(Surv(time,status) ~ 1, data = lung)所以拟合简单的指数回归(仅用于示例目的)是:
lung.reg <- survreg(Surv(time,status) ~ 1, data = lung, dist="exponential")我如何预测时间= 400时的生存百分比?
当我使用以下内容时:
myPredict400 <- predict(lung.reg, newdata=data.frame(time=400), type="response")我得到以下内容:
myPredict400 1 421.7758我期待37%的东西,所以我错过了一些非常明显的东西
I am befuddled by the format to perform a simple prediction using R's survival package
library(survival) lung.surv <- survfit(Surv(time,status) ~ 1, data = lung)So fitting a simple exponential regression (for example purposes only) is:
lung.reg <- survreg(Surv(time,status) ~ 1, data = lung, dist="exponential")How would I predict the percent survival at time=400?
When I use the following:
myPredict400 <- predict(lung.reg, newdata=data.frame(time=400), type="response")I get the following:
myPredict400 1 421.7758I was expecting something like 37% so I am missing something pretty obvious
最满意答案
这种生存功能的关键在于找到适合生存时间的经验分布。 本质上,你将生存时间与概率联系起来。 一旦你有了这种分配,你可以选择一段时间内的生存率。
尝试这个:
library(survival) lung.reg <- survreg(Surv(time,status) ~ 1, data = lung) # because you want a distribution pct <- 1:99/100 # this creates the empirical survival probabilities myPredict400 <- predict(lung.reg, newdata=data.frame(time=400),type='quantile', p=pct) indx = which(abs(myPredict400 - 400) == min(abs(myPredict400 - 400))) # find the closest survival time to 400 print(1 - pct[indx]) # 0.39直接来自帮助文档,这是一个情节:
matplot(myPredict400, 1-pct, xlab="Months", ylab="Survival", type='l', lty=c(1,2,2), col=1)
编辑
你基本上适合于概率分布的回归(因此1到99中的100)。 如果你把它变成100,那么你预测的最后一个值就是inf因为第100百分位的存活率是无限的。 这就是quantile和pct参数的作用。
例如,设置pct = 1:999/1000您可以获得更精确的预测值( myPredict400 )。 此外,如果您将pct设置为某个不是正确概率的值(即小于0或大于1),您将收到错误。 我建议你玩这些值,看看它们如何影响你的生存率。
The point with this survival function is to find an empirical distribution that fits the survival times. Essentially you are associating a survival time with a probability. Once you have that distribution, you can pick out the survival rate for a given time.
Try this:
library(survival) lung.reg <- survreg(Surv(time,status) ~ 1, data = lung) # because you want a distribution pct <- 1:99/100 # this creates the empirical survival probabilities myPredict400 <- predict(lung.reg, newdata=data.frame(time=400),type='quantile', p=pct) indx = which(abs(myPredict400 - 400) == min(abs(myPredict400 - 400))) # find the closest survival time to 400 print(1 - pct[indx]) # 0.39Straight from the help docs, here's a plot of it:
matplot(myPredict400, 1-pct, xlab="Months", ylab="Survival", type='l', lty=c(1,2,2), col=1)Edited
You're basically fitting a regression to a distribution of probabilities (hence 1...99 out of 100). If you make it go to 100, then the last value of your prediction is inf because the survival rate in the 100th percentile is infinite. This is what the quantile and pct arguments do.
For example, setting pct = 1:999/1000 you get much more precise values for the prediction (myPredict400). Also, if you set pct to be some value that's not a proper probability (i.e. less than 0 or more than 1) you'll get an error. I suggest you play with these values and see how they impact your survival rates.
更多推荐
发布评论