合并具有公共字段的列表的最快方法?

编程入门行业动态更新时间:2024-10-15 18:26:15

本文介绍了合并具有公共字段的列表的最快方法?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在学习F#，并且正在做赔率比较服务(ala www.bestbetting)，以将pu理论付诸实践. 到目前为止，我具有以下数据结构:

I am learning F# and I'm doing and odds comparison service (ala www.bestbetting) to pu theory into practice. So far I have the following structures of data:

type price = { Bookie : string; Odds : float32; } type selection = { Prices : list<price>; Name : string; } type event = { Name : string; Hour : DateTime; Sport : string; Selections : list<selection>; }

因此，我有几个来自不同来源的事件".而且，我需要一种非常快速的方式来将具有相同名称和小时的事件合并，然后将具有相同名称的不同选择的价格合并.

So, I have several of these "Events" coming from several sources. And I would need a really fast way of merging events with the same Name and Hour, and once that is done merge the prices of its different selections that have the same Name.

我已经考虑过要获取第一个列表，然后对其他列表进行一个一个的搜索，当指定的字段匹配时，将返回一个包含两个列表合并的新列表.

I've thought about getting the first list and then do a one-by-one search on the other lists and when the specified field matches return a new list containing both lists merged.

我想知道这样做是否有更快的方法，因为性能很重要.我已经看过了这个合并多个数据列表通过F#中的通用ID一起使用. ...尽管这很有用，但我仍在寻求最佳的性能解决方案.也许使用了不是列表的任何其他结构或将它们合并的另一种方法……因此，任何建议都将不胜感激.

I'd like to know if there's a faster way of doing this as performance would be important. I have already seen this Merge multiple lists of data together by common ID in F# ... And although that was helpful, I am asking for the best performance-wise solution. Maybe using any other structure that it's not a list or another way of merging them... so any advice would be greatly appreciated.

谢谢！

推荐答案

正如丹尼尔(Daniel)在评论中提到的那样，关键问题是，与基于标准Seq.groupBy函数的解决方案相比，性能需要改善多少?如果您要处理大量数据，那么为此目的实际上使用一些数据库可能会更容易.

As Daniel mentioned in the comment, the key question is, how much better does the performance need to be compared to a solution based on standard Seq.groupBy function? If you have a lot of data to process, then it may be actually easier to use some database for this purpose.

如果只需要快约1.7倍(或者可能更多，取决于内核数:-)，则可以尝试使用基于并行LINQ的并行版本替换Seq.groupBy，该并行版本在F#PowerPack中可用.使用PSeq.groupBy(和其他PSeq函数)，您可以编写如下内容:

If you only need something ~1.7 times faster (or possibly more, depending on the number of cores :-)), then you can try replacing Seq.groupBy with parallel version based on Parallel LINQ that is available in F# PowerPack. Using PSeq.groupBy (and other PSeq functions), you can write something like this:

#r "FSharp.PowerPack.Parallel.Seq.dll" open Microsoft.FSharp.Collections // Takes a collection of events and merges prices of events with the same name/hour let mergeEvents (events:seq<event>) = events |> PSeq.groupBy (fun evt -> evt.Name, evt.Hour) |> PSeq.map (fun ((name, hour), events) -> // Merge prices of all events in the group with the same Selections.Name let selections = events |> PSeq.collect (fun evt -> evt.Selections) |> PSeq.groupBy (fun sel -> sel.Name) |> PSeq.map (fun (name, sels) -> { Name = name Prices = sels |> Seq.collect (fun s -> s.Prices) |> List.ofSeq } ) |> PSeq.toList // Build new Event as the result - since we're grouping just using // name & hour, I'm using the first available 'Sport' value // (which may not make sense) { Name = name Hour = hour Sport = (Seq.head events).Sport Selections = selections }) |> PSeq.toList

我没有测试该版本的性能，但是我认为它应该更快.您也不需要引用整个程序集-您可以从 PowerPack源代码.上次我检查时，将功能标记为inline时，性能会更好，而在当前源代码中情况并非如此，因此您可能也要检查一下.

I didn't test the performance of this version, but I believe it should be faster. You also don't need to reference the entire assembly - you can just copy source for the few relevant functions from PowerPack source code. Last time I checked, the performance was better when the functions were marked as inline, which is not the case in the current source code, so you may want to check that too.