与频繁模式挖掘关联规则

编程入门 行业动态 更新时间:2024-10-14 04:27:15
本文介绍了与频繁模式挖掘关联规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我想提取的一组交易的关联规则有以下code火花斯卡拉:

VAL FPG =新FPGrowth()。setMinSupport(minSupport).setNumPartitions(10)VAL模型= fpg.run(交易)model.generateAssociationRules(minConfidence).collect()

但产品数量都超过10K所以提取的规则对所有组合计算前pressive而且我也不需要他们。所以我想只提取成对:

产品1 ==>产品2产品1 ==>产品3产品3 ==>产品1

和我不关心其他组合,如:

[产品1] ==> [产品2,产品3][产品3,产品1] ==>产品2

有没有办法做到这一点?

谢谢,阿米尔

解决方案

假设你的交易看起来或多或少是这样的:

VAL交易= sc.parallelize(SEQ(  阵列(一,B,E),  阵列(C,B,E,F),  阵列(一,B,C),  阵列(C,E,F),  阵列(D,E,F)))

您可以尝试手动生成频繁项集和应用 AssociationRules 直接

进口org.apache.spark.mllib.fpm.AssociationRules进口org.apache.spark.mllib.fpm.FPGrowth.FreqItemsetVAL freqItemsets =交易  .flatMap(XS =>    (xsbinations(1)+ xsbinations(2))图(X =>(x.toList,1升))。  )  .reduceByKey(_ + _)  .MAP {情况下(XS,CNT)=>新FreqItemset(xs.toArray,CNT)}VAL AR =新AssociationRules()  .setMinConfidence(0.8)VAL结果= ar.run(freqItemsets)

注:

  • 不幸的是你必须支持人工处理过滤。它可以通过 freqItemsets 应用过滤器来完成
  • 您应该考虑增加分区数之前 flatMap
  • 如果 freqItemsets 是大要处理,你可以拆分 freqItemsets 成几个步骤来模仿实际FP增长:

  • 生成1模式,并支持通过过滤
  • 使用步骤1
  • 只能频繁模式产生2-模式

I want to extract association rules for a set of transaction with following code Spark-Scala:

val fpg = new FPGrowth().setMinSupport(minSupport).setNumPartitions(10) val model = fpg.run(transactions) model.generateAssociationRules(minConfidence).collect()

however the number of products are more than 10K so extracting the rules for all combination is computationally expressive and also I do not need them all. So I want to extract only pair wise:

Product 1 ==> Product 2 Product 1 ==> Product 3 Product 3 ==> Product 1

and I do not care about other combination such as:

[Product 1] ==> [Product 2, Product 3] [Product 3,Product 1] ==> Product 2

Is there any way to do that?

Thanks, Amir

解决方案

Assuming your transactions look more or less like this:

val transactions = sc.parallelize(Seq( Array("a", "b", "e"), Array("c", "b", "e", "f"), Array("a", "b", "c"), Array("c", "e", "f"), Array("d", "e", "f") ))

you can try to generate frequent itemsets manually and apply AssociationRules directly:

import org.apache.spark.mllib.fpm.AssociationRules import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset val freqItemsets = transactions .flatMap(xs => (xsbinations(1) ++ xsbinations(2)).map(x => (x.toList, 1L)) ) .reduceByKey(_ + _) .map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)} val ar = new AssociationRules() .setMinConfidence(0.8) val results = ar.run(freqItemsets)

Notes:

  • unfortunately you'll have to handle filtering by support manually. It can be done by applying filter on freqItemsets
  • you should consider increasing number of partitions before flatMap
  • if freqItemsets is to large to be handled you can split freqItemsets into few steps to mimic actual FP-growth:

  • generate 1-patterns and filter by support
  • generate 2-patterns using only frequent patterns from step 1

更多推荐

与频繁模式挖掘关联规则

本文发布于:2023-11-30 17:22:58,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1650878.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:频繁   规则   模式

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!