我正在使用Weka在Text分类项目中工作,我有12个类我需要为每个类找到区分一个类和其他类的文本关键字,所以我想为每个类独立制作特征向量(FV)并存储12( FV)分开的12个arff文件!
问题是 - >如何在一个特征向量中组合12个不同的特征向量?
I am working in a Text categorization project using Weka,I have 12 class I need to find text keywords for each class that distinguish one class from others, So I am thinking to make feature vector(FV) for each class independently and store 12 (FV)s in separated 12 arff files!
The Question Is --> How can I combine 12 different Feature vectors in one feature vector?
最满意答案
根据类重叠与否,我提出了两种不同的方法,而不是加入特征向量:
如果类没有重叠(也就是说,没有文档同时存在于两个或更多个类中),您宁愿构建单个ARFF文件,然后使用AttributeSelection过滤器( Ranker search, InfoGainAttributeEval求值程序建议)来确定所有课程中最具歧视性的特征。
如果类重叠,您可以构建十二个一次又一次的休息分类器,每个分类器都有自己的词汇表。 您也可以将属性选择应用于每个独立问题,找到最能区分单个类和其他所有问题的功能。
Depending on classes overlapping or not, I propose two different approaches instead of joining the feature vectors:
If classes are not overlapping (that is, no document is in two or more classes at the same time), you would rather build a single ARFF file and then make use of the AttributeSelection filter (Ranker search, InfoGainAttributeEval evaluator suggested) to determine which features most discriminate among all the classes.
If classes are overlapping, you could build twelve one-again-the-rest classifiers, each one with its own vocabulary. You could apply attribute selection to each independent problem as well, finding the features that best discriminate a single class from all of the rest.
更多推荐
发布评论