CARET中使用时间片方法的模型解释(Model interpretation using timeslice method in CARET)

编程入门 行业动态 更新时间:2024-10-28 00:19:35
CARET中使用时间片方法的模型解释(Model interpretation using timeslice method in CARET)

假设您想评估一个简单的glm模型来预测经济数据系列。 考虑下面的代码:

library(caret) library(ggplot2) data(economics) h <- 7 myTimeControl <- trainControl(method = "timeslice", initialWindow = 24*h, horizon = 12, fixedWindow = TRUE) fit.glm <- train(unemploy ~ pce + pop + psavert, data = economics, method = "glm", preProc = c("center", "scale","BoxCox"), trControl = myTimeControl)

假设用于列车公式的协变量是通过某些其他模型获得的值的预测。 这个简单的模型给出了以下结果:

Generalized Linear Model 574 samples 3 predictor Pre-processing: centered (3), scaled (3), Box-Cox transformation (3) Resampling: Rolling Forecasting Origin Resampling (12 held-out with a fixed window) Summary of sample sizes: 168, 168, 168, 168, 168, 168, ... Resampling results: RMSE Rsquared 1446.335 0.2958317

除了获得不好的结果(这只是一个例子)。 我想知道这是否正确:

将上述结果视为在整个数据集上获得的结果,通过仅使用24 * h = 24 * 7样本训练的GLM,并在每个地平线= 12个样本后重新训练 如何评估RMSE的范围从1增加到12(如http://robjhyndman.com/hyndsight/tscvexample/所述 )?

如果我显示fit.glm总结我获得:

Call: NULL Deviance Residuals: Min 1Q Median 3Q Max -5090.0 -1025.5 -208.1 833.4 4948.4 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7771.56 64.93 119.688 < 2e-16 *** pce 5750.27 1153.03 4.987 8.15e-07 *** pop -1483.01 1117.06 -1.328 0.185 psavert 2932.38 144.56 20.286 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 2420081) Null deviance: 3999514594 on 573 degrees of freedom Residual deviance: 1379446256 on 570 degrees of freedom AIC: 10072 Number of Fisher Scoring iterations: 2

显示的参数是指上次训练的GLM还是“平均”参数? 我希望我已经清楚了。

Suppose you want to evaluate a simple glm model to forecast an economic data series. Consider the following code:

library(caret) library(ggplot2) data(economics) h <- 7 myTimeControl <- trainControl(method = "timeslice", initialWindow = 24*h, horizon = 12, fixedWindow = TRUE) fit.glm <- train(unemploy ~ pce + pop + psavert, data = economics, method = "glm", preProc = c("center", "scale","BoxCox"), trControl = myTimeControl)

Suppose that the covariates used into the train formula are predictions of values obtained by some other model. This simple model gives the following results:

Generalized Linear Model 574 samples 3 predictor Pre-processing: centered (3), scaled (3), Box-Cox transformation (3) Resampling: Rolling Forecasting Origin Resampling (12 held-out with a fixed window) Summary of sample sizes: 168, 168, 168, 168, 168, 168, ... Resampling results: RMSE Rsquared 1446.335 0.2958317

Apart from the bad results obtained (this is only an example). I wonder if it is correct:

To consider the above results as results obtained, on the entire dataset, by a GLM trained using only 24*h=24*7 samples and retrained after every horizon=12 samples How evaluate RMSE as horizon grows from 1 to 12 (as reported here http://robjhyndman.com/hyndsight/tscvexample/ )?

if I show fit.glm summary I obtain:

Call: NULL Deviance Residuals: Min 1Q Median 3Q Max -5090.0 -1025.5 -208.1 833.4 4948.4 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7771.56 64.93 119.688 < 2e-16 *** pce 5750.27 1153.03 4.987 8.15e-07 *** pop -1483.01 1117.06 -1.328 0.185 psavert 2932.38 144.56 20.286 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 2420081) Null deviance: 3999514594 on 573 degrees of freedom Residual deviance: 1379446256 on 570 degrees of freedom AIC: 10072 Number of Fisher Scoring iterations: 2

The parameters showed refer to the last trained GLM or are "average" paramters? I hope I've been clear enough.

最满意答案

这种重采样方法与其他任何方法一样。 使用训练数据的不同子集估计RMSE。 请注意,它显示“ Summary of sample sizes: 168, 168, 168, 168, 168, 168, ... ”。 最终模型使用所有的训练数据集。

Rob的结果与这些之间的差异主要是由于平均绝对误差(MAE)和均方根误差(RMSE)之间的差异

This resampling method is like any others. The RMSE is estimated using different subsets of the training data. Note that it says "Summary of sample sizes: 168, 168, 168, 168, 168, 168, ...". The final model uses all of the training data set.

The difference between Rob's results and these are primarily due to the difference between Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE)

更多推荐

本文发布于:2023-07-23 04:42:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1227578.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:模型   时间   方法   CARET   method

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!