通过其中一个列的值对数据框进行子集(Subsetting a data frame by a value of one of its colums)

我有一个相当大的数据框架。这是一个简化的例子：

Group Element Value Note 1 AAA 11 Good 1 ABA 12 Good 1 AVA 13 Good 2 CBA 14 Good 2 FDA 14 Good 3 JHA 16 Good 3 AHF 16 Good 3 AKF 17 Good

这是一个dput ：

dat <- structure(list(Group = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L), Element = structure(c(1L, 2L, 5L, 6L, 7L, 8L, 3L, 4L), .Label = c("AAA", "ABA", "AHF", "AKF", "AVA", "CBA", "FDA", "JHA"), class = "factor"), Value = c(11L, 12L, 13L, 14L, 14L, 16L, 16L, 17L), Note = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Good", class = "factor")), .Names = c("Group", "Element", "Value", "Note"), class = "data.frame", row.names = c(NA, -8L))

我试图根据小组分开它。所以我们说吧

第1组将是一个数据框：

Group Element Value Note 1 AAA 11 Good 1 ABA 12 Good 1 AVA 13 Good

第2组：

2 CBA 14 Good 2 FDA 14 Good

等等。

I have a rather large data frame. Here is a simplified example:

Group Element Value Note 1 AAA 11 Good 1 ABA 12 Good 1 AVA 13 Good 2 CBA 14 Good 2 FDA 14 Good 3 JHA 16 Good 3 AHF 16 Good 3 AKF 17 Good

Here it is as a dput:

dat <- structure(list(Group = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L), Element = structure(c(1L, 2L, 5L, 6L, 7L, 8L, 3L, 4L), .Label = c("AAA", "ABA", "AHF", "AKF", "AVA", "CBA", "FDA", "JHA"), class = "factor"), Value = c(11L, 12L, 13L, 14L, 14L, 16L, 16L, 17L), Note = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Good", class = "factor")), .Names = c("Group", "Element", "Value", "Note"), class = "data.frame", row.names = c(NA, -8L))

I'm trying to separate it based on the group. so let's say

Group 1 will be a data frame:

Group Element Value Note 1 AAA 11 Good 1 ABA 12 Good 1 AVA 13 Good

Group 2:

2 CBA 14 Good 2 FDA 14 Good

and so on.

最满意答案

您可以使用split 。

> dat ## Group Element Value Note ## 1 1 AAA 11 Good ## 2 1 ABA 12 Good ## 3 1 AVA 13 Good ## 4 2 CBA 14 Good ## 5 2 FDA 14 Good ## 6 3 JHA 16 Good ## 7 3 AHF 16 Good ## 8 3 AKF 17 Good > x <- split(dat, dat$Group)

然后，您可以使用x[[1]] ， x[[2]]等按组编号访问每个单独的数据框。例如，这是第2组：

> x[[2]] ## or x[2] ## Group Element Value Note ## 4 2 CBA 14 Good ## 5 2 FDA 14 Good

ADD：由于您在注释中询问了它，因此您可以使用write.csv和lapply将每个单独的数据帧写入文件。 invisible包装器只是为了抑制lapply的输出

> invisible(lapply(seq(x), function(i){ write.csv(x[[i]], file = paste0(i, ".csv"), row.names = FALSE) }))

我们可以看到文件是通过查看list.files创建的

> list.files(pattern = "^[0-9].csv") ## [1] "1.csv" "2.csv" "3.csv"

我们可以使用read.csv查看第三组的数据框

> read.csv("3.csv") ## Group Element Value Note ## 1 3 JHA 16 Good ## 2 3 AHF 16 Good ## 3 3 AKF 17 Good

You can use split for this.

> dat ## Group Element Value Note ## 1 1 AAA 11 Good ## 2 1 ABA 12 Good ## 3 1 AVA 13 Good ## 4 2 CBA 14 Good ## 5 2 FDA 14 Good ## 6 3 JHA 16 Good ## 7 3 AHF 16 Good ## 8 3 AKF 17 Good > x <- split(dat, dat$Group)

Then you can access each individual data frame by group number with x[[1]], x[[2]], etc. For example, here is group 2:

> x[[2]] ## or x[2] ## Group Element Value Note ## 4 2 CBA 14 Good ## 5 2 FDA 14 Good

ADD: Since you asked about it in the comments, you can write each individual data frame to file with write.csv and lapply. The invisible wrapper is simply to suppress the output of lapply

> invisible(lapply(seq(x), function(i){ write.csv(x[[i]], file = paste0(i, ".csv"), row.names = FALSE) }))

We can see that the files were created by looking at list.files

> list.files(pattern = "^[0-9].csv") ## [1] "1.csv" "2.csv" "3.csv"

And we can see the data frame of the third group with read.csv

> read.csv("3.csv") ## Group Element Value Note ## 1 3 JHA 16 Good ## 2 3 AHF 16 Good ## 3 3 AKF 17 Good

更多推荐