与频繁模式挖掘关联规则

编程入门行业动态更新时间:2024-10-14 04:27:15

本文介绍了与频繁模式挖掘关联规则的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我想提取的一组交易的关联规则有以下code火花斯卡拉：

VAL FPG =新FPGrowth（）。setMinSupport（minSupport）.setNumPartitions（10）VAL模型= fpg.run（交易）model.generateAssociationRules（minConfidence）.collect（）

但产品数量都超过10K所以提取的规则对所有组合计算前pressive而且我也不需要他们。所以我想只提取成对：

产品1 ==＆GT;产品2产品1 ==＆GT;产品3产品3 ==＆GT;产品1

和我不关心其他组合，如：

[产品1] ==＆GT; [产品2，产品3][产品3，产品1] ==＆GT;产品2

有没有办法做到这一点？

谢谢，阿米尔

解决方案

假设你的交易看起来或多或少是这样的：

VAL交易= sc.parallelize（SEQ（阵列（一，B，E），阵列（C，B，E，F），阵列（一，B，C），阵列（C，E，F），阵列（D，E，F）））

您可以尝试手动生成频繁项集和应用 AssociationRules 直接

进口org.apache.spark.mllib.fpm.AssociationRules进口org.apache.spark.mllib.fpm.FPGrowth.FreqItemsetVAL freqItemsets =交易 .flatMap（XS =＆GT; （xsbinations（1）+ xsbinations（2））图（X =＆GT;（x.toList，1升））。） .reduceByKey（_ + _） .MAP {情况下（XS，CNT）=＆GT;新FreqItemset（xs.toArray，CNT）}VAL AR =新AssociationRules（） .setMinConfidence（0.8）VAL结果= ar.run（freqItemsets）

注：

不幸的是你必须支持人工处理过滤。它可以通过 freqItemsets 应用过滤器来完成
您应该考虑增加分区数之前 flatMap
如果 freqItemsets 是大要处理，你可以拆分 freqItemsets 成几个步骤来模仿实际FP增长：
生成1模式，并支持通过过滤
使用步骤1

I want to extract association rules for a set of transaction with following code Spark-Scala:

val fpg = new FPGrowth().setMinSupport(minSupport).setNumPartitions(10) val model = fpg.run(transactions) model.generateAssociationRules(minConfidence).collect()

however the number of products are more than 10K so extracting the rules for all combination is computationally expressive and also I do not need them all. So I want to extract only pair wise:

Product 1 ==> Product 2 Product 1 ==> Product 3 Product 3 ==> Product 1

and I do not care about other combination such as:

[Product 1] ==> [Product 2, Product 3] [Product 3,Product 1] ==> Product 2

Is there any way to do that?

Thanks, Amir

解决方案

Assuming your transactions look more or less like this:

val transactions = sc.parallelize(Seq( Array("a", "b", "e"), Array("c", "b", "e", "f"), Array("a", "b", "c"), Array("c", "e", "f"), Array("d", "e", "f") ))

you can try to generate frequent itemsets manually and apply AssociationRules directly:

import org.apache.spark.mllib.fpm.AssociationRules import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset val freqItemsets = transactions .flatMap(xs => (xsbinations(1) ++ xsbinations(2)).map(x => (x.toList, 1L)) ) .reduceByKey(_ + _) .map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)} val ar = new AssociationRules() .setMinConfidence(0.8) val results = ar.run(freqItemsets)

Notes:

unfortunately you'll have to handle filtering by support manually. It can be done by applying filter on freqItemsets
you should consider increasing number of partitions before flatMap
if freqItemsets is to large to be handled you can split freqItemsets into few steps to mimic actual FP-growth:
generate 1-patterns and filter by support
generate 2-patterns using only frequent patterns from step 1

更多推荐

与频繁模式挖掘关联规则

本文发布于:2023-11-30 17:22:58，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1650878.html