R中的quantreg包中的anova.rq（）(anova.rq() in quantreg package in R)

我有兴趣使用anova.rqlist在R中的quantreg包的环境中调用的anova.rqlist函数来比较不同分位数（相同结果，相同协变量）的估计值。但是函数中的数学超出了我的基本专业知识。让我说我适合不同分位数的3个模型;

library(quantreg) data(Mammals) # data in quantreg to be used as a useful example fit1 <- rq(weight ~ speed + hoppers + specials, tau = .25, data = Mammals) fit2 <- rq(weight ~ speed + hoppers + specials, tau = .5, data = Mammals) fit3 <- rq(weight ~ speed + hoppers + specials, tau = .75, data = Mammals)

然后我用它们比较它们;

anova(fit1, fit2, fit3, test="Wald", joint=FALSE)

我的问题是这些模型被用作比较的基础？

我对Wald测试的理解（wiki入门）

其中θ^是与建议值θ0进行比较的感兴趣参数θ的估计。

所以我的问题是在quantreg选择θ0的quantreg函数是什么？

根据anova返回的p值，我最好的猜测是选择指定的最低分位数（即tau=0.25 ）。有没有办法指定中位数（ tau = 0.5 ）或更好的平均估计值来自lm(y ~ x1 + x2 + x3, data) ？

anova(fit1, fit2, fit3, joint=FALSE)

实际产生

Quantile Regression Analysis of Deviance Table Model: weight ~ speed + hoppers + specials Tests of Equality of Distinct Slopes: tau in { 0.25 0.5 0.75 } Df Resid Df F value Pr(>F) speed 2 319 1.0379 0.35539 hoppersTRUE 2 319 4.4161 0.01283 * specialsTRUE 2 319 1.7290 0.17911 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

而

anova(fit3, fit1, fit2, joint=FALSE)

产生完全相同的结果

Quantile Regression Analysis of Deviance Table Model: weight ~ speed + hoppers + specials Tests of Equality of Distinct Slopes: tau in { 0.5 0.25 0.75 } Df Resid Df F value Pr(>F) speed 2 319 1.0379 0.35539 hoppersTRUE 2 319 4.4161 0.01283 * specialsTRUE 2 319 1.7290 0.17911 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

模型的顺序在anova中明显改变，但是如何在两个测试中F值和Pr（> F）相同？

I'm interested in comparing estimates from different quantiles (same outcome, same covariates) using anova.rqlist function called by anova in the environment of the quantreg package in R. However the math in the function is beyond my rudimentary expertise. Lets say i fit 3 models at different quantiles;

library(quantreg) data(Mammals) # data in quantreg to be used as a useful example fit1 <- rq(weight ~ speed + hoppers + specials, tau = .25, data = Mammals) fit2 <- rq(weight ~ speed + hoppers + specials, tau = .5, data = Mammals) fit3 <- rq(weight ~ speed + hoppers + specials, tau = .75, data = Mammals)

Then i compare them using;

anova(fit1, fit2, fit3, test="Wald", joint=FALSE)

My question is which is of these models is being used as the basis of the comparison?

My understanding of the Wald test (wiki entry)

where θ^ is the estimate of the parameter(s) of interest θ that is compared with the proposed value θ0.

So my question is what is the anova function in quantreg choosing as the θ0?

Based on the pvalue returned from the anova my best guess is that it is choosing the lowest quantile specified (ie tau=0.25). Is there a way to specify the median (tau = 0.5) or better yet the mean estimate from obtained using lm(y ~ x1 + x2 + x3, data)?

anova(fit1, fit2, fit3, joint=FALSE)

actually produces

Quantile Regression Analysis of Deviance Table Model: weight ~ speed + hoppers + specials Tests of Equality of Distinct Slopes: tau in { 0.25 0.5 0.75 } Df Resid Df F value Pr(>F) speed 2 319 1.0379 0.35539 hoppersTRUE 2 319 4.4161 0.01283 * specialsTRUE 2 319 1.7290 0.17911 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

while

anova(fit3, fit1, fit2, joint=FALSE)

produces the exact same result

Quantile Regression Analysis of Deviance Table Model: weight ~ speed + hoppers + specials Tests of Equality of Distinct Slopes: tau in { 0.5 0.25 0.75 } Df Resid Df F value Pr(>F) speed 2 319 1.0379 0.35539 hoppersTRUE 2 319 4.4161 0.01283 * specialsTRUE 2 319 1.7290 0.17911 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The order of the models is clearly being changed in the anova, but how is it that the F value and Pr(>F) are identical in both tests?

最满意答案

使用您输入的所有分位数，并且没有一个模型用作参考。

我建议你阅读这篇文章和相关的答案，以了解你的“theta.0”是什么。

我相信你要做的是测试回归线是否平行。换句话说，预测变量的影响（这里只有收入）是否在分位数上是均匀的。

您可以使用quantreg包中的anova anova()来回答这个问题。你应该确实为每个分位数使用几个拟合。

当你像你一样使用joint=FALSE时，你会得到系数方面的比较。但是你只有一个系数，所以只有一行！并且您的结果告诉您，在您的示例中，收入的影响并不是统一的。使用几个预测变量，您将获得几个p值。

如果不使用joint=FALSE ，那么你可以对整组系数的相等性进行全面测试，这将给你一个“斜率平等联合测试”，因此只有一个p值。

编辑：

我认为theta.0是所有'tau'值的平均斜率或'lm（）'的实际估计值，而不是任何模型的特定斜率。我的理由是'anova.rq（）'不需要任何特定的低值'tau'或甚至中值'tau'。

有几种方法可以测试它。要么手动进行计算，使得θ等于平均值，要么比较许多组合，因为那样你可能会出现某些模型接近模型的情况，但是'tau'值较低但不是'lm' （）'价值。因此，如果theta.0是具有最低'tau'的第一个模型的斜率，那么你的Pr（> F）将是高的，而在另一个情况下，它将是低的。

这个问题应该在交叉验证时提出。

All the quantiles you input are used and there is not one model used as a reference.

I suggest you read this post and the related answer to understand what your "theta.0" is.

I believe what you are trying to do is to test whether the regression lines are parallel. In other words whether the effects of the predictor variables (only income here) are uniform across quantiles.

You can use the anova() from the quantreg package to answer this question. You should indeed use several fits for each quantile.

When you use joint=FALSE as you did, you get coefficient-wise comparisons. But you only have one coefficient so there is only one line! And your results tells you that the effect of income is not uniform accross quantiles in your example. Use several predictor variables and you will get several p-values.

You can do an overall test of equality of the entire sets of coefficients if you do not use joint=FALSE and that would give you a "Joint Test of Equality of Slopes" and therefore only one p-value.

EDIT:

I think theta.0 is the average slope for all 'tau' values or the actual estimate from 'lm()', rather than a specific slope of any of the models. My reasoning is that 'anova.rq()' does not require any specific low value of 'tau' or even the median 'tau'.

There are several ways to test this. Either do the calculations by hand with theta.0 being equal to the average value, or compare many combinations because then you could a situation where certain of your models are close to the model with a low 'tau' values but not to the 'lm()' value. So if theta.0 is the slope of the first model with lowest 'tau' then your Pr(>F) will be high whereas in the other case, it will be low.

This question should maybe have been asked on cross-validated.

更多推荐