我想了解的先验(篮)算法的基本原理进行数据挖掘,
I'm trying to understand the fundamentals of the Apriori (Basket) Algorithm for use in data mining,
它是最好的,我解释了并发症我遇到一个例子:
It's best I explain the complication i'm having with an example:
下面是事务性的数据集:
Here is a transactional dataset:
t1: Milk, Chicken, Beer t2: Chicken, Cheese t3: Cheese, Boots t4: Cheese, Chicken, Beer t5: Chicken, Beer, Clothes, Cheese, Milk t6: Clothes, Beer, Milk t7: Beer, Milk, Clothes
在最小支持度的上面是0.5或50%。
考虑从上面,我的交易数量显然是7 ,意为一个项集是频繁,它必须有 4/7 的一个数。因此,这是我的频繁项集1:
Taking from the above, my number of transactions is clearly 7, meaning for an itemset to be "frequent" it must have a count of 4/7. As such this was my Frequent itemset 1:
F1:
Milk = 4 Chicken = 4 Beer = 5 Cheese = 4
然后,我创造了我的候选人第二细化(C2)和它缩小:
I then created my candidates for the second refinement (C2) and narrowed it down to:
F2:
{Milk, Beer} = 4这是我感到困惑,如果我被要求显示的所有的频繁项集我记下所有 F1的和 F2 或只是 F2 ? F1 来我不是套。
This is where I get confused, if I am asked to display all frequent itemsets do I write down all of F1 and F2 or just F2? F1 to me aren't "sets".
我接着问到创建我刚才定义的频繁项集关联规则,并计算他们的信心的数字,我得到这样的:
I am then asked to create association rules for the frequent itemsets I have just defined and calculate their "confidence" figures, I get this:
Milk -> Beer = 100% confidence Beer -> Milk = 80% confidence
这似乎是多余的放 F1 的项目集在这里,因为他们都将有100%的信心,无论而实际上并不准什么,这是我现在质疑 F1 是否确实经常?
It seems superfluous to put F1's itemsets in here as they will all have a confidence of 100% regardless and don't actually "associate" anything, which is the reason I am now questioning whether F1 are indeed "frequent"?
推荐答案与1认为频繁的,如果他们的支持是合适的尺寸项目集。 但是在这里你必须考虑在最低门槛。就像如果您的最低门槛在你的例子是 2 和 F1 将不予考虑。但是,如果在最低门槛是 1 ,然后你不得不这样做。
Itemsets with size of 1 considered frequent if their support is suitable. But here you have to consider the minimal threshold. like if your minimal threshold in your example is 2 then F1 will not be considered. But if the minimal threshold is 1 then you have to.
您可以看看这里和的此处更多的想法和例子。
you can take a look here and here for more ideas and examples.
希望我帮助。
更多推荐
频繁项集和放大器;关联规则
发布评论