我有一个R数据框,其中数字向量表示沿着染色体的位置和基因名称的向量。 我还有一个有关该染色体上有趣元素起始位置的载体。 我想提取每个元素上下三个最接近基因的名称和位置,我想知道最有效的方法。
例如:
genes <- data.frame("geneStart"=sort(sample(500,10)), "geneName"=sample(LETTERS,10)) genes geneStart geneName 1 66 X 2 158 U 3 262 N 4 385 D 5 387 H 6 418 Z 7 464 J 8 469 Y 9 475 L 10 491 I我想最终得到一个函数,让我们称之为getAdjacent ,如下所示:
getAdjacent(280) [1] "X" "U" "N" "D" "H" "Z" getAdjacent(479) [1] "J" "Y" "L" "I" NA NAI have an R data frame with a vector of numbers representing positions along a chromosome and a vector of gene names. I also have a vector of start positions of interesting elements on that chromosome. I'd like to extract the names and positions of the 3 closest genes both above and below each element, and I'm wondering the most efficient way to go about this.
For example:
genes <- data.frame("geneStart"=sort(sample(500,10)), "geneName"=sample(LETTERS,10)) genes geneStart geneName 1 66 X 2 158 U 3 262 N 4 385 D 5 387 H 6 418 Z 7 464 J 8 469 Y 9 475 L 10 491 II want to end up with a function, let's call it getAdjacent, like so:
getAdjacent(280) [1] "X" "U" "N" "D" "H" "Z" getAdjacent(479) [1] "J" "Y" "L" "I" NA NA最满意答案
使用findInterval :
getAdjacent <- function(x) { idx <- findInterval(x, genes$geneStart) range.idx <- (idx-2):(idx+3) range.idx <- ifelse(range.idx <= 0, NA, range.idx) as.character(genes$geneName)[range.idx] }如果x属于genes$geneStart您可能需要调整行为,具体取决于您的偏好。
Using findInterval:
getAdjacent <- function(x) { idx <- findInterval(x, genes$geneStart) range.idx <- (idx-2):(idx+3) range.idx <- ifelse(range.idx <= 0, NA, range.idx) as.character(genes$geneName)[range.idx] }You might have to adjust the behavior if x belongs to genes$geneStart depending on your preference.
更多推荐
发布评论