我正在尝试使用sklearn执行PCA的python实现.我创建了以下功能:
I am trying to perform the python implementation of PCA using sklearn. I have created the following function:
def dimensionality_reduction(train_dataset_mod1, train_dataset_mod2, test_dataset_mod1, test_dataset_mod2): pca = PCA(n_components= 200) pca.fit(train_dataset_mod1.transpose()) mod1_features_train = pcaponents_ pca2 = PCA(n_components=200) pca2.fit(train_dataset_mod2.transpose()) mod2_features_train = pca2ponents_ mod1_features_test = pca.transform(test_dataset_mod1) mod2_features_test = pca2.transform(test_dataset_mod2) return mod1_features_train.transpose(), mod2_features_train.transpose(), mod1_features_test, mod2_features_test我的矩阵大小如下:
train_dataset_mod1 733x5000 test_dataset_mod1 360x5000 mod1_features_train 200x733 train_dataset_mod2 733x8000 test_dataset_mod2 360x8000 mod2_features_train 200x733
train_dataset_mod1 733x5000 test_dataset_mod1 360x5000 mod1_features_train 200x733 train_dataset_mod2 733x8000 test_dataset_mod2 360x8000 mod2_features_train 200x733
但是,当我尝试运行整个脚本时,会收到以下消息:
However when I am trying to run the whole script I am receiving the following message:
在转换中的文件"\ Anaconda2 \ lib \ site-packages \ sklearn \ decomposition \ base.py",第132行 X = X-self.mean _
File "\Anaconda2\lib\site-packages\sklearn\decomposition\base.py", line 132, in transform X = X - self.mean_
出了什么问题?如何将pca应用于测试数据?
What is the issue? How can I apply the pca to the test data?
下面是为mod1调试pca的示例:
Here an example of the debugging of pca for mod1:
转换后的数据集mod1_features_train和mod1_features_train的正确大小均为500x733.但是我不能对test_dataset_mod1和test_dataset_mod2做同样的事情,为什么?
The transformed dataset mod1_features_train and mod1_features_train having the correct size both 500x733. However I cannot do the same with test_dataset_mod1 and test_dataset_mod2, why?
在调试过程中,我注意到pca的base.py文件中有一个运算X = X-self.mean,其中X是我的测试数据,self_mean是从适合火车组(slf_mean的大小为733,与X不匹配).如果我在训练过程中删除了transpose(),则pca正常运行而没有错误,则test_dataset_mod1和test_dataset_mod2的大小正确为360x500,但是,train_dataset_mod1和train_dataset_mod2的大小错误为5000x500 ??
During the debugging I noticed that the base.py file of pca, there is an operation X = X - self.mean where X is my test data and self_mean the mean calculated from the fit into the train set (the size of the slf_mean is 733 which does not match with the X). If i remove the transpose() in the training process the pca is working normally without errors, the test_dataset_mod1 and test_dataset_mod2 having correct size 360x500, however, the train_dataset_mod1 and train_dataset_mod2 having wrong sizes 5000x500???
推荐答案您不应该在fit函数中转置矩阵,或者如果必须这样做,则必须在transform函数中转置矩阵:
you shouldn't have transpose your matrix in in fit function or if you have to , you have to transpose your matrix in the transform function :
pca.fit(train_dataset_mod1) pca2.fit(train_dataset_mod2) mod1_features_test = pca.transform(test_dataset_mod1) mod2_features_test = pca2.transform(test_dataset_mod2)或:
pca.fit(train_dataset_mod1.transpose()) pca2.fit(train_dataset_mod2.transpose()) mod1_features_test = pca.transform(test_dataset_mod1.transpose()) mod2_features_test = pca2.transform(test_dataset_mod2.transpose())更多推荐
将pca应用于测试数据
发布评论