CARET中使用时间片方法的模型解释(Model interpretation using timeslice method in CARET)

假设您想评估一个简单的glm模型来预测经济数据系列。考虑下面的代码：

library(caret) library(ggplot2) data(economics) h <- 7 myTimeControl <- trainControl(method = "timeslice", initialWindow = 24*h, horizon = 12, fixedWindow = TRUE) fit.glm <- train(unemploy ~ pce + pop + psavert, data = economics, method = "glm", preProc = c("center", "scale","BoxCox"), trControl = myTimeControl)

假设用于列车公式的协变量是通过某些其他模型获得的值的预测。这个简单的模型给出了以下结果：

Generalized Linear Model 574 samples 3 predictor Pre-processing: centered (3), scaled (3), Box-Cox transformation (3) Resampling: Rolling Forecasting Origin Resampling (12 held-out with a fixed window) Summary of sample sizes: 168, 168, 168, 168, 168, 168, ... Resampling results: RMSE Rsquared 1446.335 0.2958317

除了获得不好的结果（这只是一个例子）。我想知道这是否正确：

将上述结果视为在整个数据集上获得的结果，通过仅使用24 * h = 24 * 7样本训练的GLM，并在每个地平线= 12个样本后重新训练如何评估RMSE的范围从1增加到12（如http://robjhyndman.com/hyndsight/tscvexample/所述）？

如果我显示fit.glm总结我获得：

Call: NULL Deviance Residuals: Min 1Q Median 3Q Max -5090.0 -1025.5 -208.1 833.4 4948.4 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7771.56 64.93 119.688 < 2e-16 *** pce 5750.27 1153.03 4.987 8.15e-07 *** pop -1483.01 1117.06 -1.328 0.185 psavert 2932.38 144.56 20.286 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 2420081) Null deviance: 3999514594 on 573 degrees of freedom Residual deviance: 1379446256 on 570 degrees of freedom AIC: 10072 Number of Fisher Scoring iterations: 2

显示的参数是指上次训练的GLM还是“平均”参数？我希望我已经清楚了。

Suppose you want to evaluate a simple glm model to forecast an economic data series. Consider the following code:

library(caret) library(ggplot2) data(economics) h <- 7 myTimeControl <- trainControl(method = "timeslice", initialWindow = 24*h, horizon = 12, fixedWindow = TRUE) fit.glm <- train(unemploy ~ pce + pop + psavert, data = economics, method = "glm", preProc = c("center", "scale","BoxCox"), trControl = myTimeControl)

Suppose that the covariates used into the train formula are predictions of values obtained by some other model. This simple model gives the following results:

Generalized Linear Model 574 samples 3 predictor Pre-processing: centered (3), scaled (3), Box-Cox transformation (3) Resampling: Rolling Forecasting Origin Resampling (12 held-out with a fixed window) Summary of sample sizes: 168, 168, 168, 168, 168, 168, ... Resampling results: RMSE Rsquared 1446.335 0.2958317

Apart from the bad results obtained (this is only an example). I wonder if it is correct:

To consider the above results as results obtained, on the entire dataset, by a GLM trained using only 24*h=24*7 samples and retrained after every horizon=12 samples How evaluate RMSE as horizon grows from 1 to 12 (as reported here http://robjhyndman.com/hyndsight/tscvexample/ )?

if I show fit.glm summary I obtain:

Call: NULL Deviance Residuals: Min 1Q Median 3Q Max -5090.0 -1025.5 -208.1 833.4 4948.4 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7771.56 64.93 119.688 < 2e-16 *** pce 5750.27 1153.03 4.987 8.15e-07 *** pop -1483.01 1117.06 -1.328 0.185 psavert 2932.38 144.56 20.286 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 2420081) Null deviance: 3999514594 on 573 degrees of freedom Residual deviance: 1379446256 on 570 degrees of freedom AIC: 10072 Number of Fisher Scoring iterations: 2

The parameters showed refer to the last trained GLM or are "average" paramters? I hope I've been clear enough.

最满意答案

这种重采样方法与其他任何方法一样。使用训练数据的不同子集估计RMSE。请注意，它显示“ Summary of sample sizes: 168, 168, 168, 168, 168, 168, ... ”。最终模型使用所有的训练数据集。

Rob的结果与这些之间的差异主要是由于平均绝对误差（MAE）和均方根误差（RMSE）之间的差异

This resampling method is like any others. The RMSE is estimated using different subsets of the training data. Note that it says "Summary of sample sizes: 168, 168, 168, 168, 168, 168, ...". The final model uses all of the training data set.

The difference between Rob's results and these are primarily due to the difference between Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE)

更多推荐

CARET中使用时间片方法的模型解释(Model interpretation using timeslice method in CARET)

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表