Scala:排序子集最合适的数据结构是什么?(Scala: what is the most appropriate data structure for sorted subsets?)

系统教程 行业动态 更新时间:2024-06-14 16:53:07
Scala:排序子集最合适的数据结构是什么?(Scala: what is the most appropriate data structure for sorted subsets?)

给定一个大型集合(我们称之为'a')T类型的元素(比如,一个Vector或List)和一个评估函数'f'(比如说,(T)=> Double)我想从'a'派生出来'结果集合'b'包含'a'的N个元素,这些元素在f下产生最高值。 集合'a'可能包含重复项。 它没有排序。

也许暂时搁置可并行性(map / reduce等)的问题,用于编译结果集合'b'的适当的Scala数据结构是什么? 感谢您的任何指示/想法。

笔记:

(1)我想我的用例可以最简洁地表达为

val a = Vector( 9,2,6,1,7,5,2,6,9 ) // just an example val f : (Int)=>Double = (n)=>n // evaluation function val b = a.sortBy( f ).take( N ) // sort, then clip

除了我不想整理整个集合。

(2)一个选项可能是对“a”的迭代,它使用“手动”大小边界填充TreeSet(拒绝任何比集合中最差项更糟糕的东西,不要让集合增长超过N)。 但是,我想保留结果集中原始集中存在的重复项,因此这可能不起作用。

(3)如果排序的多集是正确的数据结构,是否有Scala实现? 或者二进制排序的Vector或Array,如果结果集相当小?

Given a large collection (let's call it 'a') of elements of type T (say, a Vector or List) and an evaluation function 'f' (say, (T) => Double) I would like to derive from 'a' a result collection 'b' that contains the N elements of 'a' that result in the highest value under f. The collection 'a' may contain duplicates. It is not sorted.

Maybe leaving the question of parallelizability (map/reduce etc.) aside for a moment, what would be the appropriate Scala data structure for compiling the result collection 'b'? Thanks for any pointers / ideas.

Notes:

(1) I guess my use case can be most concisely expressed as

val a = Vector( 9,2,6,1,7,5,2,6,9 ) // just an example val f : (Int)=>Double = (n)=>n // evaluation function val b = a.sortBy( f ).take( N ) // sort, then clip

except that I do not want to sort the entire set.

(2) one option might be an iteration over 'a' that fills a TreeSet with 'manual' size bounding (reject anything worse than the worst item in the set, don't let the set grow beyond N). However, I would like to retain duplicates present in the original set in the result set, and so this may not work.

(3) if a sorted multi-set is the right data structure, is there a Scala implementation of this? Or a binary-sorted Vector or Array, if the result set is reasonably small?

最满意答案

您可以使用优先级队列:

def firstK[A](xs: Seq[A], k: Int)(implicit ord: Ordering[A]) = { val q = new scala.collection.mutable.PriorityQueue[A]()(ord.reverse) val (before, after) = xs.splitAt(k) q ++= before after.foreach(x => q += ord.max(x, q.dequeue)) q.dequeueAll }

我们用前k元素填充队列,然后将每个附加元素与队列的头部进行比较,并根据需要进行交换。 这按预期工作并保留重复:

scala> firstK(Vector(9, 2, 6, 1, 7, 5, 2, 6, 9), 4) res14: scala.collection.mutable.Buffer[Int] = ArrayBuffer(6, 7, 9, 9)

它没有对完整列表进行排序。 我在这个实现中有一个Ordering ,但是对它进行调整以使用评估函数将是非常简单的。

You can use a priority queue:

def firstK[A](xs: Seq[A], k: Int)(implicit ord: Ordering[A]) = { val q = new scala.collection.mutable.PriorityQueue[A]()(ord.reverse) val (before, after) = xs.splitAt(k) q ++= before after.foreach(x => q += ord.max(x, q.dequeue)) q.dequeueAll }

We fill the queue with the first k elements and then compare each additional element to the head of the queue, swapping as necessary. This works as expected and retains duplicates:

scala> firstK(Vector(9, 2, 6, 1, 7, 5, 2, 6, 9), 4) res14: scala.collection.mutable.Buffer[Int] = ArrayBuffer(6, 7, 9, 9)

And it doesn't sort the complete list. I've got an Ordering in this implementation, but adapting it to use an evaluation function would be pretty trivial.

更多推荐

本文发布于:2023-04-05 21:10:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/dzcp/fb52a2ff04bd906b5bfc50669ef9cc4c.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:子集   数据结构   最合适   Scala   subsets

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!