R因子值发生变化(R factor values changing)

编程入门 行业动态 更新时间:2024-10-24 00:32:27
R因子值发生变化(R factor values changing)

我正在尝试在R中进行一些数据操作。我有2个数据帧,一个是训练数据,另一个测试数据所有数据都是分类的并存储为因子变量。

数据中有一些NA,我试图将它们转换为“-1”。 当我为训练数据做的时候,情况很好,但不适用于测试数据。

在我运行的循环中,某些东西会改变这些值,但我无法弄清楚是什么。

这是以前的:

> class(catTrain1[,"Cat_111"]) [1] "factor" > class(catTest1[,"Cat_111"]) [1] "factor" > table(catTrain1[,"Cat_111"]) 1 2 726 25 > table(catTest1[,"Cat_111"]) 0 1 2 1 503 15

这是循环:

> for(i in 1:ncol(catTrain1)){ + catTrain1[,i] <- as.factor(as.character(ifelse(is.na(catTrain1[,i]), "-1", catTrain1[,i]))) + } > for(i in 1:ncol(catTest1)){ + catTest1[,i] <- as.factor(as.character(ifelse(is.na(catTest1[,i]), "-1", catTest1[,i]))) + }

这是后面的:

> table(catTrain1[,"Cat_111"]) 1 2 726 25 > table(catTest1[,"Cat_111"]) 1 2 3 1 503 15

我已经看到向上移动一个字符 - >数字转换但我无法弄清楚为什么会发生这种情况,特别是对于其中一个数据帧/循环。

有什么建议?

I'm trying to do some data manipulation in R. I have 2 data frames, one is training data, the other testing data all the data is categorical and stored as factor variables.

There are some NA's in the data and I'm trying to convert them to "-1". When I do it for the training data, things go fine, but not for the test data.

Something changes the values during a loop I run but I can't figure out what.

Here's the before:

> class(catTrain1[,"Cat_111"]) [1] "factor" > class(catTest1[,"Cat_111"]) [1] "factor" > table(catTrain1[,"Cat_111"]) 1 2 726 25 > table(catTest1[,"Cat_111"]) 0 1 2 1 503 15

Here's the loop:

> for(i in 1:ncol(catTrain1)){ + catTrain1[,i] <- as.factor(as.character(ifelse(is.na(catTrain1[,i]), "-1", catTrain1[,i]))) + } > for(i in 1:ncol(catTest1)){ + catTest1[,i] <- as.factor(as.character(ifelse(is.na(catTest1[,i]), "-1", catTest1[,i]))) + }

Here's the after:

> table(catTrain1[,"Cat_111"]) 1 2 726 25 > table(catTest1[,"Cat_111"]) 1 2 3 1 503 15

I've seen the shift up by one with character -> numeric conversions but I can't figure out why this is happening, especially for just one of the dataframes / loops.

Any suggestions?

最满意答案

第一组table调用中的列名是因子的级别。 在对table的第二组调用table ,列名是级别索引。 ifelse正在拉动指数,而不是水平。 在循环中,移动最终catTest1[,i]和catTrain1[,i]周围的catTest1[,i] 。

The column names in your first set of calls to table are the levels of the factor. In the second set of calls to table, the column names are the level indexes. ifelse is pulling the indexes, not the levels. In your loops, move the as.character in around the final catTest1[,i] and catTrain1[,i].

更多推荐

本文发布于:2023-04-27 23:13:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1329666.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:因子   发生   factor   values   changing

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!