在 R 中拆分为两个类别

编程入门行业动态更新时间:2024-10-13 08:22:30

本文介绍了在 R 中拆分为两个类别的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

限时送ChatGPT账号..

n <- 3
strata <- rep(1:4, each=n)
y <- rnorm(n =12)
x <- 1:12
category <- rep(c("A", "B", "C"), times = 4)
df <- cbind.data.frame(y, x, strata, category)

我想首先按strata"将我的数据拆分为一个列表，然后我想再次按category"拆分新列表中的所有数据框.最后，我想在每个结果数据框内对 x 上的 y 进行回归(在这种情况下，每个数据框将是一行，但在实际数据中，每个层的长度不同，层内的类别数也不同).

I want to first split my data into a list by "strata", and then I want to again split all the data frames inside the new list by "category". And finally I want to regress y on x inside each of the resulting data frames (in this case each data frame would be one row but in the actual data there are different lengths of each strata and a different number of categories inside strata).

推荐答案

R 中的规范方式是使用 split:

The canonical way in R is to use split:

L <- split(df, df[,c("strata","category")])
L
# $`1.A`
#           y x strata category
# 1 -1.120867 1      1        A
# $`2.A`
#           y x strata category
# 4 -1.023001 4      2        A
# $`3.A`
#           y x strata category
# 7 0.5411806 7      3        A
# $`4.A`
#           y  x strata category
# 10 1.546789 10      4        A
# $`1.B`
#           y x strata category
# 2 0.6730641 2      1        B
# $`2.B`
#           y x strata category
# 5 -1.466816 5      2        B
# $`3.B`
#            y x strata category
# 8 -0.1955617 8      3        B
# $`4.B`
#            y  x strata category
# 11 -0.660904 11      4        B
# $`1.C`
#            y x strata category
# 3 -0.9880206 3      1        C
# $`2.C`
#           y x strata category
# 6 0.4111802 6      2        C
# $`3.C`
#             y x strata category
# 9 -0.03311637 9      3        C
# $`4.C`
#            y  x strata category
# 12 0.6799109 12      4        C

12 元素列表的名称(此处)是两个分类变量的字符串连接，.-delimited;这很容易被覆盖(手动).

The names of the 12-element list (here) are the string-concatenation of the two categorical variables, .-delimited; this can easily be overridden (manually).

从这里开始，要对每个元素进行回归，您可能会执行以下操作:

From here, to do regression on every element, you'd likely do something like:

models <- lapply(L, function(x) lm(..., data=x))

(或您计划使用的任何回归工具).

(or whichever regression tool you are planning to use).

如果您愿意，可以一步完成，

You can do this in one step if you'd like,

results <- by(df, df[,c("strata","category")], function(x) lm(..., data=x))

好处是它一步完成.by 返回可能看起来有点奇怪，但它实际上只是一个带有一些特殊 print.by 的 list使用的方法；您仍然可以根据需要像列表一样引用它.

The benefit is that it does it in one step. The by return can look a bit odd, but it is really just a list with some special print.by methods being used; you can still reference it just like a list as needed.

在 dplyr 中执行此操作的另一种方法:

Another way to do this in dplyr:

library(dplyr)
results <- df %>%
  group_by(strata, category) %>%
  summarize(model = list(lm(y ~ x)))
results
# # A tibble: 12 x 3
# # Groups:   strata [4]
#    strata category model 
#     <int> <chr>    <list>
#  1      1 A        <lm>  
#  2      1 B        <lm>  
#  3      1 C        <lm>  
#  4      2 A        <lm>  
#  5      2 B        <lm>  
#  6      2 C        <lm>  
#  7      3 A        <lm>  
#  8      3 B        <lm>  
#  9      3 C        <lm>  
# 10      4 A        <lm>  
# 11      4 B        <lm>  
# 12      4 C        <lm>  
results$model[[1]]
# Call:
# lm(formula = y ~ x)
# Coefficients:
# (Intercept)            x  
#      -1.121           NA

正如 Onyambu 所指出的(谢谢！)，这很有效(没有 data=)，因为我们明确列出了变量，它们会被找到.例如，如果您的回归使用 .，您可能希望使用

As pointed out by Onyambu (thank you!), this works well (without data=) because we are explicitly listing the variables, and they will be found. If your regression uses ., for example, you may want to formalize it a little with

results <- df %>%
  group_by(strata, category) %>%
  summarize(model = list(lm(y ~ ., data = cur_data())))

y~x 没有它也能工作，但 y~. 不行，所以 data=cur_data().

y~x will work without it, but y~. will not, ergo data=cur_data().

这篇关于在 R 中拆分为两个类别的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

更多推荐

[db:关键词]

本文发布于:2023-04-30 13:25:14，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1394440.html

类别两个

上一篇：如何在QT中获取音频文件的比特率？(How to get bitrate for audio file in QT?)
下一篇： R 中的条件字符串拆分(使用 tidyr)

发布评论取消回复

评论列表（有 0 条评论）

在 R 中拆分为两个类别

问题描述

推荐答案

发布评论取消回复

最近发表

热门文章

标签列表