将pca应用于测试数据

编程入门行业动态更新时间:2024-10-28 06:34:02

本文介绍了将pca应用于测试数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在尝试使用sklearn执行PCA的python实现.我创建了以下功能:

I am trying to perform the python implementation of PCA using sklearn. I have created the following function:

def dimensionality_reduction(train_dataset_mod1, train_dataset_mod2, test_dataset_mod1, test_dataset_mod2): pca = PCA(n_components= 200) pca.fit(train_dataset_mod1.transpose()) mod1_features_train = pcaponents_ pca2 = PCA(n_components=200) pca2.fit(train_dataset_mod2.transpose()) mod2_features_train = pca2ponents_ mod1_features_test = pca.transform(test_dataset_mod1) mod2_features_test = pca2.transform(test_dataset_mod2) return mod1_features_train.transpose(), mod2_features_train.transpose(), mod1_features_test, mod2_features_test

我的矩阵大小如下:

train_dataset_mod1 733x5000 test_dataset_mod1 360x5000 mod1_features_train 200x733 train_dataset_mod2 733x8000 test_dataset_mod2 360x8000 mod2_features_train 200x733

但是，当我尝试运行整个脚本时，会收到以下消息:

However when I am trying to run the whole script I am receiving the following message:

在转换中的文件"\ Anaconda2 \ lib \ site-packages \ sklearn \ decomposition \ base.py"，第132行 X = X-self.mean _

File "\Anaconda2\lib\site-packages\sklearn\decomposition\base.py", line 132, in transform X = X - self.mean_

出了什么问题?如何将pca应用于测试数据?

What is the issue? How can I apply the pca to the test data?

下面是为mod1调试pca的示例:

Here an example of the debugging of pca for mod1:

转换后的数据集mod1_features_train和mod1_features_train的正确大小均为500x733.但是我不能对test_dataset_mod1和test_dataset_mod2做同样的事情，为什么?

The transformed dataset mod1_features_train and mod1_features_train having the correct size both 500x733. However I cannot do the same with test_dataset_mod1 and test_dataset_mod2, why?

在调试过程中，我注意到pca的base.py文件中有一个运算X = X-self.mean，其中X是我的测试数据，self_mean是从适合火车组(slf_mean的大小为733，与X不匹配).如果我在训练过程中删除了transpose()，则pca正常运行而没有错误，则test_dataset_mod1和test_dataset_mod2的大小正确为360x500，但是，train_dataset_mod1和train_dataset_mod2的大小错误为5000x500 ??

During the debugging I noticed that the base.py file of pca, there is an operation X = X - self.mean where X is my test data and self_mean the mean calculated from the fit into the train set (the size of the slf_mean is 733 which does not match with the X). If i remove the transpose() in the training process the pca is working normally without errors, the test_dataset_mod1 and test_dataset_mod2 having correct size 360x500, however, the train_dataset_mod1 and train_dataset_mod2 having wrong sizes 5000x500???

推荐答案

您不应该在fit函数中转置矩阵，或者如果必须这样做，则必须在transform函数中转置矩阵:

you shouldn't have transpose your matrix in in fit function or if you have to , you have to transpose your matrix in the transform function :

pca.fit(train_dataset_mod1) pca2.fit(train_dataset_mod2) mod1_features_test = pca.transform(test_dataset_mod1) mod2_features_test = pca2.transform(test_dataset_mod2)

或:

pca.fit(train_dataset_mod1.transpose()) pca2.fit(train_dataset_mod2.transpose()) mod1_features_test = pca.transform(test_dataset_mod1.transpose()) mod2_features_test = pca2.transform(test_dataset_mod2.transpose())

更多推荐

将pca应用于测试数据

本文发布于:2023-08-05 20:41:02，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1307945.html