我有基因表达数据作为每个探针的计数数,如下所示:
library(data.table) mydata <- fread( "molclass,mol.id,sample1,sample2,sample3 negative, negat1, 0, 1, 2 negative, negat2, 2, 1, 1 negative, negat3, 1, 2, 0 endogen, gene1, 30, 15, 10 endogen, gene2, 60, 30, 20 ")我的问题是 - 执行背景减法的最佳方法是什么,即对于我需要计算背景的每个sampleN列(假设它将是negative类的所有值的平均值)然后从每个值中减去这个背景。这一栏。 目前我正在使用以下解决方案:
for (nm in names(mydata)[-c(1:2)]) { bg <- mydata[molclass=='negative', nm, with=F]; bg <- mean(unlist(bg)); mydata[[nm]] <- (mydata[[nm]] - bg); }但我觉得必须有一些“更好”的方式。
PS我知道有一些软件包可以做这些事情,但我的数据对应的是计数,而不是信号的强度 - 所以我不能使用limma或类似的微阵列工具。 也许一些seq数据包可以提供帮助,但我不确定,因为我的数据也不是来自排序。
I have gene expression data as number of counts for each probe, something like this:
library(data.table) mydata <- fread( "molclass,mol.id,sample1,sample2,sample3 negative, negat1, 0, 1, 2 negative, negat2, 2, 1, 1 negative, negat3, 1, 2, 0 endogen, gene1, 30, 15, 10 endogen, gene2, 60, 30, 20 ")My question here is - what would be the best way to perform background subtraction, i.e. for each sampleN column I need to calculate background (let's say it will be the average of all values from negative class) and then subtract this background from each value of this column. For the moment I am using the following solution:
for (nm in names(mydata)[-c(1:2)]) { bg <- mydata[molclass=='negative', nm, with=F]; bg <- mean(unlist(bg)); mydata[[nm]] <- (mydata[[nm]] - bg); }but I feel there must be some "nicer" way.
P.S. I know that there are some packages that do those things, but my data correspond to the number of counts, not intensity of signal - so I can't use limma or similar tools designed for microarrays. Maybe some seq-data packages could help, but I am not sure because my data is not from sequencing either.
最满意答案
通常,您不应该使用<- with data.table 。 使用set循环中的最后一个赋值会更好。 有关详细信息,请键入?set以查看帮助页面。
mycols <- paste0('sample',1:3) newcols <- paste0(mycols,'bk') s <- mydata[['molclass']] == 'negative' mybkds <- sapply(mycols,function(j) mean(mydata[[j]][s]) ) mydata[,(newcols):=NA] for (j in mycols) set(mydata,j=paste0(j,'bk'),value=mydata[[j]]-mybkds[j])我只在循环中完成了最后一步,但这基本上和你的代码一样(所有东西都在循环中)。 *apply函数和循环只是不同的语法,我听说过,你可以选择你喜欢的任何一种。
Generally, you shouldn't use <- with a data.table. The last assignment in your loop would be better with set. See the help page by typing ?set for details.
mycols <- paste0('sample',1:3) newcols <- paste0(mycols,'bk') s <- mydata[['molclass']] == 'negative' mybkds <- sapply(mycols,function(j) mean(mydata[[j]][s]) ) mydata[,(newcols):=NA] for (j in mycols) set(mydata,j=paste0(j,'bk'),value=mydata[[j]]-mybkds[j])I've only done the last step in a loop, but this is basically the same as your code (where everything is in the loop). *apply functions and loops are just different syntax, I've heard, and you can go with whichever you prefer.
更多推荐
发布评论