使用二进制矩阵/出现矩阵创建具有pandas和scikit的决策树(Create decision trees with pandas and scikit learn using binary ma

编程入门 行业动态 更新时间:2024-10-25 02:26:03
使用二进制矩阵/出现矩阵创建具有pandas和scikit的决策树(Create decision trees with pandas and scikit learn using binary matrix/occurance matrix)

我有一个数据集,它实际上是某些项目的特征向量的出现矩阵。 理论上,这种类型的表示有助于将机器学习算法应用于数据集作为其规范化。

a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,class 1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,class1 0,1,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,1,1,1,0,1,0,0,1,class2 0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,class2 1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,1,class3

但我似乎无法使用python中的pandas和scikit学习提供的算法。 我没见过任何例子。

数据集的格式如下。 其中特征vector =[a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z]和类变量位于显示类的文件的末尾(例如: - 'class1','class2'。'class3')

如何为这种类型的数据集应用决策树算法(如CART和Naive Bayes)? (我只检查了scikit学习库)

I have a data set which is actually a occurrence matrix of a feature vector for some numbers of items. In theory, this type of representation helps to apply machine learning algorithms to data set as its normalized.

a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,class 1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,class1 0,1,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,1,1,1,0,1,0,0,1,class2 0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,class2 1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,1,class3

But I cant seem to use the algorithms provided by pandas and scikit learning in python. I haven't seen any examples.

The format of the data set is as follows. where feature vector =[a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z] and class variable is at the end of the file showing classes (eg:- 'class1','class2'.'class3')

How could I apply decision tree algorithms such as CART and Naive Bayes for this type of data sets? (I only checked scikit learning library)

最满意答案

您需要为类/因变量使用整数,而不是字符串。

这是一个例子:

In [1]: # Here I'm just mapping very simply, you can definitely use regex or something for your case if you have a lot of classes df['class'] = df['class'].map({'class1':0, 'class2':1, 'class3':2}) In [2]: df Out[2]: a b ... y z class 0 1 ... 1 1 0 1 0 ... 0 1 1 2 0 ... 0 1 1 3 1 ... 0 1 2 In [3]: # Break between X (independent variables) and y (dependent, class) X = df.iloc[:,:-1] y = df['class'] In [4]: # Now you can do your fit etc... from sklearn.naive_bayes import GaussianNB gnb = GaussianNB() result = gnb.fit(X,y) In [5]: y_pred = result.predict(X) y_pred Out[5]: array([0, 1, 1, 2], dtype=int64)

我们看到它正确地预测了该类(显然,假设有大量特征与样本大小(p> n))。

You need to use integers for your class/dependent variable, not strings.

Here's an example:

In [1]: # Here I'm just mapping very simply, you can definitely use regex or something for your case if you have a lot of classes df['class'] = df['class'].map({'class1':0, 'class2':1, 'class3':2}) In [2]: df Out[2]: a b ... y z class 0 1 ... 1 1 0 1 0 ... 0 1 1 2 0 ... 0 1 1 3 1 ... 0 1 2 In [3]: # Break between X (independent variables) and y (dependent, class) X = df.iloc[:,:-1] y = df['class'] In [4]: # Now you can do your fit etc... from sklearn.naive_bayes import GaussianNB gnb = GaussianNB() result = gnb.fit(X,y) In [5]: y_pred = result.predict(X) y_pred Out[5]: array([0, 1, 1, 2], dtype=int64)

We see that it predicted correctly the class (obviously given the high number of features versus sample size (p>n)).

更多推荐

本文发布于:2023-08-05 06:44:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1430187.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:矩阵   决策树   Create   scikit   decision

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!