我使用scikit-learn来实现一个简单的监督学习算法。 本质上我遵循这里的教程(但我自己的数据)。
我尝试适合模型:
clf = svm.SVC(gamma=0.001, C=100.) clf.fit(features_training,labels_training)但在第二行,我收到一个错误: ValueError: could not convert string to float: 'A'
错误是预期的,因为label_training包含表示三个不同类别的字符串值,例如A , B , C 。
所以问题是:如果标记数据表示字符串形式的类别,我如何使用SVC(支持向量分类)。 对我来说,一个直观的解决方案似乎只是将每个字符串转换为数字。 例如, A = 0 , B = 1等。但这真的是最好的解决方案吗?
I use scikit-learn to implement a simple supervised learning algorithm. In essence I follow the tutorial here (but with my own data).
I try to fit the model:
clf = svm.SVC(gamma=0.001, C=100.) clf.fit(features_training,labels_training)But at the second line, I get an error: ValueError: could not convert string to float: 'A'
The error is expected because label_training contains string values which represent three different categories, such as A, B, C.
So the question is: How do I use SVC (support vector classification), if the labelled data represents categories in form of strings. One intuitive solution to me seems to simply convert each string to a number. For instance, A = 0, B = 1, etc. But is this really the best solution?
最满意答案
请查看http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features section 4.3.4 Encoding categorical features.
特别是,请看一下使用OneHotEncoder 。 这会将分类值转换为SVM可以使用的格式。
Take a look at http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features section 4.3.4 Encoding categorical features.
In particular, look at using the OneHotEncoder. This will convert categorical values into a format that can be used by SVM's.
更多推荐
发布评论