合并具有公共字段的列表的最快方法?

编程入门 行业动态 更新时间:2024-10-15 18:26:15
本文介绍了合并具有公共字段的列表的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在学习F#,并且正在做赔率比较服务(ala www.bestbetting),以将pu理论付诸实践. 到目前为止,我具有以下数据结构:

I am learning F# and I'm doing and odds comparison service (ala www.bestbetting) to pu theory into practice. So far I have the following structures of data:

type price = { Bookie : string; Odds : float32; } type selection = { Prices : list<price>; Name : string; } type event = { Name : string; Hour : DateTime; Sport : string; Selections : list<selection>; }

因此,我有几个来自不同来源的事件".而且,我需要一种非常快速的方式来将具有相同名称和小时的事件合并,然后将具有相同名称的不同选择的价格合并.

So, I have several of these "Events" coming from several sources. And I would need a really fast way of merging events with the same Name and Hour, and once that is done merge the prices of its different selections that have the same Name.

我已经考虑过要获取第一个列表,然后对其他列表进行一个一个的搜索,当指定的字段匹配时,将返回一个包含两个列表合并的新列表.

I've thought about getting the first list and then do a one-by-one search on the other lists and when the specified field matches return a new list containing both lists merged.

我想知道这样做是否有更快的方法,因为性能很重要.我已经看过了这个合并多个数据列表通过F#中的通用ID一起使用. ...尽管这很有用,但我仍在寻求最佳的性能解决方案.也许使用了不是列表的任何其他结构或将它们合并的另一种方法……因此,任何建议都将不胜感激.

I'd like to know if there's a faster way of doing this as performance would be important. I have already seen this Merge multiple lists of data together by common ID in F# ... And although that was helpful, I am asking for the best performance-wise solution. Maybe using any other structure that it's not a list or another way of merging them... so any advice would be greatly appreciated.

谢谢!

推荐答案

正如丹尼尔(Daniel)在评论中提到的那样,关键问题是,与基于标准Seq.groupBy函数的解决方案相比,性能需要改善多少?如果您要处理大量数据,那么为此目的实际上使用一些数据库可能会更容易.

As Daniel mentioned in the comment, the key question is, how much better does the performance need to be compared to a solution based on standard Seq.groupBy function? If you have a lot of data to process, then it may be actually easier to use some database for this purpose.

如果只需要快约1.7倍(或者可能更多,取决于内核数:-),则可以尝试使用基于并行LINQ的并行版本替换Seq.groupBy,该并行版本在F#PowerPack中可用.使用PSeq.groupBy(和其他PSeq函数),您可以编写如下内容:

If you only need something ~1.7 times faster (or possibly more, depending on the number of cores :-)), then you can try replacing Seq.groupBy with parallel version based on Parallel LINQ that is available in F# PowerPack. Using PSeq.groupBy (and other PSeq functions), you can write something like this:

#r "FSharp.PowerPack.Parallel.Seq.dll" open Microsoft.FSharp.Collections // Takes a collection of events and merges prices of events with the same name/hour let mergeEvents (events:seq<event>) = events |> PSeq.groupBy (fun evt -> evt.Name, evt.Hour) |> PSeq.map (fun ((name, hour), events) -> // Merge prices of all events in the group with the same Selections.Name let selections = events |> PSeq.collect (fun evt -> evt.Selections) |> PSeq.groupBy (fun sel -> sel.Name) |> PSeq.map (fun (name, sels) -> { Name = name Prices = sels |> Seq.collect (fun s -> s.Prices) |> List.ofSeq } ) |> PSeq.toList // Build new Event as the result - since we're grouping just using // name & hour, I'm using the first available 'Sport' value // (which may not make sense) { Name = name Hour = hour Sport = (Seq.head events).Sport Selections = selections }) |> PSeq.toList

我没有测试该版本的性能,但是我认为它应该更快.您也不需要引用整个程序集-您可以从 PowerPack源代码.上次我检查时,将功能标记为inline时,性能会更好,而在当前源代码中情况并非如此,因此您可能也要检查一下.

I didn't test the performance of this version, but I believe it should be faster. You also don't need to reference the entire assembly - you can just copy source for the few relevant functions from PowerPack source code. Last time I checked, the performance was better when the functions were marked as inline, which is not the case in the current source code, so you may want to check that too.

更多推荐

合并具有公共字段的列表的最快方法?

本文发布于:2023-11-29 12:29:05,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1646396.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:字段   最快   方法   列表

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!