问题描述
限时送ChatGPT账号..我有一个类似于以下示例的数据框(这是我实际数据框的一小部分摘录).
I have a data frame similar to the example below (which is a small extract of my actual data frame).
frequencies <- data.frame(sex=c("female", "female", "male", "male", "female", "female", "male", "male", "female", "female", "male", "male", "female", "female", "male", "male"),
ecotype=c("Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave"),
contig_ID=c("Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367",
"Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481"),
allele=c("p", "p", "p", "p", "q", "q", "q", "q", "p", "p", "p", "p", "q", "q", "q", "q"),
frequency=c(157, 98, 140, 65, 29, 8, 26, 9, 182, 108, 147, 80, 46, 4, 49, 4))
我想对contig_ID"和ecotype"的每个组合进行单独的卡方应变测试,测试性别"和等位基因"之间的关联.然后,我想在一个表中总结这些结果,其中包括contig_ID"和ecotype"的每个组合的 p 值.例如,从给出的示例表中,我希望得到一个包含 4 个 p 值的结果表,如下例所示.
I would like to do separate chi-square contingency tests for each combination of ‘contig_ID’ and ‘ecotype’, testing the association between ‘sex’ and ‘allele’. I would then like to summarise the results of these in a table that includes the p value for each combination of ‘contig_ID’ and ‘ecotype’. For instance, from the example table given, I would expect a results table of 4 p values like the example below.
results <- data.frame(ecotype=c("Crab", "Wave", "Crab", "Wave"),
contig_ID=c("Contig100169_2367", "Contig100169_2367", "Contig100169_2481", "Contig100169_2481"),
pvalue=c("pval", "pval", "pval", "pval"))
或者,只需在原始表中添加一个 p 值列也可以,每个组合的 p 值只是在所有相关行中重复.
Alternatively, just adding a p value column to the original table would also work, with the p value for each combination just repeated in all the relevant rows.
我一直在尝试使用 lapply()
和 summarise()
等函数结合 chisq.test()
来实现但是到目前为止还没有运气.我也尝试使用类似的方法:对表格中的每一行进行 R 卡方检验(3x2 列联表) ,但也无法完成这项工作.
I have been attempting to use functions such as lapply()
and summarise()
in combination with chisq.test()
to achieve this but have had no luck so far. I have also attempted to use a method similar to this: R chi squared test (3x2 contingency table) for each row in a table , but couldn't make this work either.
推荐答案
我们可以将 contig_ID
和 ecotype
列分组并创建一个嵌套数据框,并将数据转换为矩阵如下.
We can group the contig_ID
and ecotype
columns and created a nested data frame with the data converted to a matrix as follows.
library(tidyverse)
frequencies2 <- frequencies %>%
group_by(contig_ID, ecotype) %>%
nest() %>%
mutate(M = map(data, function(dat){
dat2 <- dat %>% spread(sex, frequency)
M <- as.matrix(dat2[, -1])
row.names(M) <- dat2$allele
return(M)
}))
如果我们查看 M
列的第一个元素,我们会发现每个组的数据都被转换为矩阵.
If we look at the first element of the M
column, we will find out that data from each group were converted to a matrix.
frequencies2$M[[1]]
# female male
# p 157 140
# q 29 26
从这里,我们可以将 chisq.test
应用于每个矩阵并提取 p 值.frequencies3
是最终输出.
From here, we can applied the chisq.test
to each matrix and pull out the p value. frequencies3
is the final output.
frequencies3 <- frequencies2 %>%
mutate(pvalue = map_dbl(M, ~chisq.test(.x)$p.value)) %>%
select(-data, -M) %>%
ungroup()
frequencies3
# # A tibble: 4 x 3
# contig_ID ecotype pvalue
# <fct> <fct> <dbl>
# 1 Contig100169_2367 Crab 1.00
# 2 Contig100169_2367 Wave 0.434
# 3 Contig100169_2481 Crab 0.284
# 4 Contig100169_2481 Wave 0.958
这篇关于使用 R,将多个卡方列联表检验应用于分组数据框,并添加一个包含检验 p 值的新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论