在向量中有效地找到相邻值(Efficiently find adjacent values in vector)

编程入门行业动态更新时间:2024-10-13 12:18:45

我有一个R数据框，其中数字向量表示沿着染色体的位置和基因名称的向量。我还有一个有关该染色体上有趣元素起始位置的载体。我想提取每个元素上下三个最接近基因的名称和位置，我想知道最有效的方法。

例如：

genes <- data.frame("geneStart"=sort(sample(500,10)), "geneName"=sample(LETTERS,10)) genes geneStart geneName 1 66 X 2 158 U 3 262 N 4 385 D 5 387 H 6 418 Z 7 464 J 8 469 Y 9 475 L 10 491 I

我想最终得到一个函数，让我们称之为getAdjacent ，如下所示：

getAdjacent(280) [1] "X" "U" "N" "D" "H" "Z" getAdjacent(479) [1] "J" "Y" "L" "I" NA NA

I have an R data frame with a vector of numbers representing positions along a chromosome and a vector of gene names. I also have a vector of start positions of interesting elements on that chromosome. I'd like to extract the names and positions of the 3 closest genes both above and below each element, and I'm wondering the most efficient way to go about this.

For example:

genes <- data.frame("geneStart"=sort(sample(500,10)), "geneName"=sample(LETTERS,10)) genes geneStart geneName 1 66 X 2 158 U 3 262 N 4 385 D 5 387 H 6 418 Z 7 464 J 8 469 Y 9 475 L 10 491 I

I want to end up with a function, let's call it getAdjacent, like so:

getAdjacent(280) [1] "X" "U" "N" "D" "H" "Z" getAdjacent(479) [1] "J" "Y" "L" "I" NA NA

最满意答案

使用findInterval ：

getAdjacent <- function(x) { idx <- findInterval(x, genes$geneStart) range.idx <- (idx-2):(idx+3) range.idx <- ifelse(range.idx <= 0, NA, range.idx) as.character(genes$geneName)[range.idx] }

如果x属于genes$geneStart您可能需要调整行为，具体取决于您的偏好。

Using findInterval:

getAdjacent <- function(x) { idx <- findInterval(x, genes$geneStart) range.idx <- (idx-2):(idx+3) range.idx <- ifelse(range.idx <= 0, NA, range.idx) as.character(genes$geneName)[range.idx] }

You might have to adjust the behavior if x belongs to genes$geneStart depending on your preference.

更多推荐

本文发布于:2023-08-06 18:02:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1452756.html