数据集的复杂子集(Complex subsetting of dataframe)

编程入门行业动态更新时间:2024-10-25 13:25:38

考虑以下数据帧：

df <- data.frame(Asset = c("A", "B", "C"), Historical = c(0.05,0.04,0.03), Forecast = c(0.04,0.02,NA)) # Asset Historical Forecast #1 A 0.05 0.04 #2 B 0.04 0.02 #3 C 0.03 NA

以及变量x 。 x由用户在R脚本的开头设置，可以取两个值： x = "Forecast"或x = "Historical" 。

如果x = "Forecast" ，我想返回以下内容：对于每个资产，如果预测可用，则从“预测”列返回相应的数字，否则，从“历史”列返回相应的数字。如下所示，A和B都有一个预测值，该值在下面返回。 C缺少预测值，因此返回历史值。

Asset Return 1 A 0.04 2 B 0.02 3 C 0.03

但是，如果x= "Historical" ，则只返回Historical列：

Asset Historical 1 A 0.05 2 B 0.04 3 C 0.03

我无法想出一个简单的方法，如果你有大量的行，暴力是非常低效的。有任何想法吗？

谢谢！

Consider the following dataframe:

df <- data.frame(Asset = c("A", "B", "C"), Historical = c(0.05,0.04,0.03), Forecast = c(0.04,0.02,NA)) # Asset Historical Forecast #1 A 0.05 0.04 #2 B 0.04 0.02 #3 C 0.03 NA

as well as the variable x. x is set by the user at the beginning of the R script, and can take two values: either x = "Forecast" or x = "Historical".

If x = "Forecast", I would like to return the following: for each asset, if a forecast is available, return the appropriate number from the column "Forecast", otherwise, return the appropriate number from the column "Historical". As you can see below, both A and B have a forecast value which is returned below. C is missing a forecast value, so the historical value is returned.

Asset Return 1 A 0.04 2 B 0.02 3 C 0.03

If, however, x= "Historical",simply return the Historical column:

Asset Historical 1 A 0.05 2 B 0.04 3 C 0.03

I can't come up with an easy way of doing it, and brute force is very inefficient if you have a large number of rows. Any ideas?

Thanks!