根据二项式测试将频率矩阵转换为r中的二进制矩阵(Turning a frequency matrix into a binary matrix in r dependent on binomial t

根据二项式测试将频率矩阵转换为r中的二进制矩阵(Turning a frequency matrix into a binary matrix in r dependent on binomial tests)

我有一个矩阵，例如这个例子，其中a1，a2，a3，a4和a5指的是彼此竞争的个体。矩阵的行代表对列中相同个体的“胜利”。

所以在下面的例子中，个人a2击败a4 12次，而a4击败a2 13次，这意味着他们共有25场比赛。

在这个例子中，对角线都是0，但它们很容易成为NA，因为每个人都不可能与自己竞争。

底层使您可以创建数据框/矩阵：

a1<-c(0,13,3,33,0) a2<-c(1,0,22,13,1) a3<-c(1,0,0,2,2) a4<-c(1,12,22,0,12) a5<-c(3,1,0,0,0) df<-as.data.frame(cbind(a1,a2,a3,a4,a5)) rownames(df)<-c("a1","a2","a3","a4","a5") df m<-as.matrix(df) m

矩阵看起来像这样：

a1 a2 a3 a4 a5 a1 0 1 1 1 3 a2 13 0 0 12 1 a3 3 22 0 22 0 a4 33 13 2 0 0 a5 0 1 2 12 0

我想要做的是将此频率矩阵转换为二进制矩阵。我希望根据针对ap = 0.5的二项式测试测试，如果他们在某个特定列中对某个人的胜利比预期的要多得多，那么我想在每个人的行中输入1

因此，对于a2对a4，你可以像这样运行binom.test

binom.test(c(12,25), 0.5))

这说明这并不重要。因此，在行a2，a4列的单元格中，我们将输入0.我们还在行a4，列a2中输入0。

然而，a4在34次中击败a1 33次，而a1在34次击败a4次。为此运行二项式测试：

binom.test(c(33,34), 0.5))

这显然是重要的，因此行a4列a1应该得到'1'，但行a1列a4得到'0'。

生成的矩阵应如下所示：

a1 a2 a3 a4 a5 a1 0 0 0 0 0 a2 1 0 0 0 0 a3 0 1 0 1 0 a4 1 0 0 0 0 a5 0 0 0 1 0

我一直在尝试一些方法，但到目前为止都失败了。

任何想法赞赏和欢迎。

I have a matrix such as this example where a1, a2, a3, a4 and a5 refer to individuals competing against each other. Rows of the matrix represent 'wins' against the same individuals in the columns.

So in the example below, individual a2 beat a4 12 times, whereas a4 beat a2 13 times, meaning that they had a total of 25 contests.

In this example, the diagonals are all 0, but they could easily be NA because it is impossible for each individual to compete with themselves.

The underneath enables you to create the dataframe/matrix:

a1<-c(0,13,3,33,0) a2<-c(1,0,22,13,1) a3<-c(1,0,0,2,2) a4<-c(1,12,22,0,12) a5<-c(3,1,0,0,0) df<-as.data.frame(cbind(a1,a2,a3,a4,a5)) rownames(df)<-c("a1","a2","a3","a4","a5") df m<-as.matrix(df) m

The matrix looks like this:

a1 a2 a3 a4 a5 a1 0 1 1 1 3 a2 13 0 0 12 1 a3 3 22 0 22 0 a4 33 13 2 0 0 a5 0 1 2 12 0

What I want to do is to turn this frequency matrix into a binary matrix. I want to enter a 1 into the row of each individual if they have significantly more wins than expected by chance against an individual in a particular column according to a binomial test testing against a p=0.5

Therefore for pair a2 versus a4, you would run the binom.test like this

binom.test(c(12,25), 0.5))

which says that this is not significant. Therefore in the cell for row a2, column a4 we would enter a 0. We also enter a 0 in the row a4, column a2.

However, a4 beats a1 33 times out of 34, whereas a1 beats a4 1 time out of 34. Running the binomial test for this:

binom.test(c(33,34), 0.5))

This is obviously significant, and therefore row a4 column a1 should get a '1', but row a1 column a4 gets a '0'.

The resulting matrix should look like this:

a1 a2 a3 a4 a5 a1 0 0 0 0 0 a2 1 0 0 0 0 a3 0 1 0 1 0 a4 1 0 0 0 0 a5 0 0 0 1 0

I've been trying a number of approaches to this, but all have failed thus far.

Any ideas appreciated and welcomed.

最满意答案

我承认，我本来打算骂你“做错了”，然后我重新阅读了这个页面以及你是如何做到这一点并重新学习binom.test 。你的问题中有一个问题，就是你错过了一个逗号，但我猜这只是在输入SO时遇到的问题。

SIDE POINT：请复制/粘贴工作代码。当所描述的代码甚至不能运行得更少时，需要花费更多的时间来推断你的意思。

但是，你仍然称错了。从?binom.test ，如果你将x定义为两个值的向量，那么它必须是“成功和失败的数量”，而不是（如你所看到的那样）“成功和试验的数量”。要么：

binom.test(12, 12+13, 0.5)

要么

binom.test(c(12, 13), 0.5)

其次，没有什么可以说服你如何尝试自动化。你说“ 行a4列a1应该得到'1'，但a1列a4得到'0' ”，但我不知道你用什么代码到达那里。如果您需要有关您尝试的代码的帮助，请包含它，即使它不优雅。学习高效优雅编码实践的最佳方法是获取您生成的内容并在某些地方进行调整。

一些代码。尝试这个：

# define the function func <- function(mtx, p=0.5, alpha=0.05) { # preallocate the matrix in memory m2 <- mtx for (rr in 2:nrow(mtx)) { for (cc in 1:(rr-1)) { # these two `for` loops work on the non-diag lower triangle x <- mtx[rr,cc] y <- mtx[cc,rr] sig <- (binom.test(x, x+y, p)$p.value <= alpha) # lower-triangle entry m2[rr,cc] <- 1*((x>y) & sig) # opposing element in the upper-triangle m2[cc,rr] <- 1*((y>x) & sig) } } m2 } # requisite variables a1 <- c(0,13,3,33,0) a2 <- c(1,0,22,13,1) a3 <- c(1,0,0,2,2) a4 <- c(1,12,22,0,12) a5 <- c(3,1,0,0,0) # merge them sequentially into a matrix m <- matrix(c(a1, a2, a3, a4, a5), byrow=FALSE, nrow=5, dimnames=list(paste0('a', 1:5), paste0('a', 1:5))) func(m) # a1 a2 a3 a4 a5 # a1 0 0 0 0 0 # a2 1 0 0 0 0 # a3 0 1 0 1 0 # a4 1 0 0 0 0 # a5 0 0 0 1 0

一些说明：

虽然在rr和cc上做1:nrow(m)并没有错，但是通过较低三角形稍微更有效地循环。您可以在代码中检查rr == cc （例如，如果binom.test计算成本很高），但在此示例中，它根本不会花费您太多。但是，如果/当您使用需要更长时间计算的测试时，您将需要在代码中保存一两秒。

1*(...)将布尔值as.integer(...)转换为0或1.我也可以使用相同的效果执行as.integer(...) 。

(x>y)确保binom.test “ binom.test结果仅针对获胜者，因为binom.test(0, 100, 0.5)仍然非常重要（尽管是输家）。

希望这可以帮助。

编辑：删除了binom.test的双重测试，因为（正如@rawr正确指出的那样）它是多余的; 并且直接从函数内部而不是内部mtx错误地访问m变量。

I admit, I was about to lambaste you for doing it "all wrong," then I re-read the page and how you were doing it and re-learned binom.test. You have one problem in your question in that you're missing a comma, but I'm guessing that this is just a problem typing it into SO.

SIDE POINT: please copy/paste working code. It takes much more time trying to infer what you meant when the code as depicted won't even run much less give the desired output.

However, you are still calling it wrong. From ?binom.test, if you define x as a vector of two values then it must be the "number of success and failures", not (as it appears you have done) the "number of successes and trials." Either do:

binom.test(12, 12+13, 0.5)

or

binom.test(c(12, 13), 0.5)

Second, there's nothing here to convince me how you've attempted to automate. You say that "row a4 column a1 should get a '1', but a1 column a4 gets a '0'", but I have no clue what code you used to get there. If you want help with the code you've tried, please include it, even if it isn't elegant. The best way to learn efficient and elegant coding practices is to take what you've generated and tweak it in places.

To some code. Try this:

# define the function func <- function(mtx, p=0.5, alpha=0.05) { # preallocate the matrix in memory m2 <- mtx for (rr in 2:nrow(mtx)) { for (cc in 1:(rr-1)) { # these two `for` loops work on the non-diag lower triangle x <- mtx[rr,cc] y <- mtx[cc,rr] sig <- (binom.test(x, x+y, p)$p.value <= alpha) # lower-triangle entry m2[rr,cc] <- 1*((x>y) & sig) # opposing element in the upper-triangle m2[cc,rr] <- 1*((y>x) & sig) } } m2 } # requisite variables a1 <- c(0,13,3,33,0) a2 <- c(1,0,22,13,1) a3 <- c(1,0,0,2,2) a4 <- c(1,12,22,0,12) a5 <- c(3,1,0,0,0) # merge them sequentially into a matrix m <- matrix(c(a1, a2, a3, a4, a5), byrow=FALSE, nrow=5, dimnames=list(paste0('a', 1:5), paste0('a', 1:5))) func(m) # a1 a2 a3 a4 a5 # a1 0 0 0 0 0 # a2 1 0 0 0 0 # a3 0 1 0 1 0 # a4 1 0 0 0 0 # a5 0 0 0 1 0

Some notes:

Looping through the lower-triangle slightly more efficient, though it's not wrong to do 1:nrow(m) on both rr and cc. You could check for rr == cc in the code (if binom.test were computational expensive, for instance), but in this example it won't cost you much at all. However, if/when you use tests that take longer to calculate, you will want to save a second or two here and there in your code.

The 1*(...) coerces a boolean into a 0 or 1. I could also have done as.integer(...) with the same effect.

The (x>y) ensures the binom.test results of "significant" is only given for winners, since binom.test(0, 100, 0.5) is still very significant (albeit a loser).

Hope this helps.

Edit: removed the double-test of binom.test because (as @rawr correctly pointed out) it was redundant; and was incorrectly accessing the m variable directly from inside the function instead of its internal mtx.

更多推荐