我正在学习F#,并且正在做赔率比较服务(ala www.bestbetting),以将pu理论付诸实践. 到目前为止,我具有以下数据结构:
I am learning F# and I'm doing and odds comparison service (ala www.bestbetting) to pu theory into practice. So far I have the following structures of data:
type price = { Bookie : string; Odds : float32; } type selection = { Prices : list<price>; Name : string; } type event = { Name : string; Hour : DateTime; Sport : string; Selections : list<selection>; }因此,我有几个来自不同来源的事件".而且,我需要一种非常快速的方式来将具有相同名称和小时的事件合并,然后将具有相同名称的不同选择的价格合并.
So, I have several of these "Events" coming from several sources. And I would need a really fast way of merging events with the same Name and Hour, and once that is done merge the prices of its different selections that have the same Name.
我已经考虑过要获取第一个列表,然后对其他列表进行一个一个的搜索,当指定的字段匹配时,将返回一个包含两个列表合并的新列表.
I've thought about getting the first list and then do a one-by-one search on the other lists and when the specified field matches return a new list containing both lists merged.
我想知道这样做是否有更快的方法,因为性能很重要.我已经看过了这个合并多个数据列表通过F#中的通用ID一起使用. ...尽管这很有用,但我仍在寻求最佳的性能解决方案.也许使用了不是列表的任何其他结构或将它们合并的另一种方法……因此,任何建议都将不胜感激.
I'd like to know if there's a faster way of doing this as performance would be important. I have already seen this Merge multiple lists of data together by common ID in F# ... And although that was helpful, I am asking for the best performance-wise solution. Maybe using any other structure that it's not a list or another way of merging them... so any advice would be greatly appreciated.
谢谢!
推荐答案正如丹尼尔(Daniel)在评论中提到的那样,关键问题是,与基于标准Seq.groupBy函数的解决方案相比,性能需要改善多少?如果您要处理大量数据,那么为此目的实际上使用一些数据库可能会更容易.
As Daniel mentioned in the comment, the key question is, how much better does the performance need to be compared to a solution based on standard Seq.groupBy function? If you have a lot of data to process, then it may be actually easier to use some database for this purpose.
如果只需要快约1.7倍(或者可能更多,取决于内核数:-),则可以尝试使用基于并行LINQ的并行版本替换Seq.groupBy,该并行版本在F#PowerPack中可用.使用PSeq.groupBy(和其他PSeq函数),您可以编写如下内容:
If you only need something ~1.7 times faster (or possibly more, depending on the number of cores :-)), then you can try replacing Seq.groupBy with parallel version based on Parallel LINQ that is available in F# PowerPack. Using PSeq.groupBy (and other PSeq functions), you can write something like this:
#r "FSharp.PowerPack.Parallel.Seq.dll" open Microsoft.FSharp.Collections // Takes a collection of events and merges prices of events with the same name/hour let mergeEvents (events:seq<event>) = events |> PSeq.groupBy (fun evt -> evt.Name, evt.Hour) |> PSeq.map (fun ((name, hour), events) -> // Merge prices of all events in the group with the same Selections.Name let selections = events |> PSeq.collect (fun evt -> evt.Selections) |> PSeq.groupBy (fun sel -> sel.Name) |> PSeq.map (fun (name, sels) -> { Name = name Prices = sels |> Seq.collect (fun s -> s.Prices) |> List.ofSeq } ) |> PSeq.toList // Build new Event as the result - since we're grouping just using // name & hour, I'm using the first available 'Sport' value // (which may not make sense) { Name = name Hour = hour Sport = (Seq.head events).Sport Selections = selections }) |> PSeq.toList我没有测试该版本的性能,但是我认为它应该更快.您也不需要引用整个程序集-您可以从 PowerPack源代码.上次我检查时,将功能标记为inline时,性能会更好,而在当前源代码中情况并非如此,因此您可能也要检查一下.
I didn't test the performance of this version, but I believe it should be faster. You also don't need to reference the entire assembly - you can just copy source for the few relevant functions from PowerPack source code. Last time I checked, the performance was better when the functions were marked as inline, which is not the case in the current source code, so you may want to check that too.
更多推荐
合并具有公共字段的列表的最快方法?
发布评论