将pca应用于测试数据

编程入门 行业动态 更新时间:2024-10-28 06:34:02
本文介绍了将pca应用于测试数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在尝试使用sklearn执行PCA的python实现.我创建了以下功能:

I am trying to perform the python implementation of PCA using sklearn. I have created the following function:

def dimensionality_reduction(train_dataset_mod1, train_dataset_mod2, test_dataset_mod1, test_dataset_mod2): pca = PCA(n_components= 200) pca.fit(train_dataset_mod1.transpose()) mod1_features_train = pcaponents_ pca2 = PCA(n_components=200) pca2.fit(train_dataset_mod2.transpose()) mod2_features_train = pca2ponents_ mod1_features_test = pca.transform(test_dataset_mod1) mod2_features_test = pca2.transform(test_dataset_mod2) return mod1_features_train.transpose(), mod2_features_train.transpose(), mod1_features_test, mod2_features_test

我的矩阵大小如下:

train_dataset_mod1 733x5000 test_dataset_mod1 360x5000 mod1_features_train 200x733 train_dataset_mod2 733x8000 test_dataset_mod2 360x8000 mod2_features_train 200x733

train_dataset_mod1 733x5000 test_dataset_mod1 360x5000 mod1_features_train 200x733 train_dataset_mod2 733x8000 test_dataset_mod2 360x8000 mod2_features_train 200x733

但是,当我尝试运行整个脚本时,会收到以下消息:

However when I am trying to run the whole script I am receiving the following message:

在转换中的文件"\ Anaconda2 \ lib \ site-packages \ sklearn \ decomposition \ base.py",第132行 X = X-self.mean _

File "\Anaconda2\lib\site-packages\sklearn\decomposition\base.py", line 132, in transform X = X - self.mean_

出了什么问题?如何将pca应用于测试数据?

What is the issue? How can I apply the pca to the test data?

下面是为mod1调试pca的示例:

Here an example of the debugging of pca for mod1:

转换后的数据集mod1_features_train和mod1_features_train的正确大小均为500x733.但是我不能对test_dataset_mod1和test_dataset_mod2做同样的事情,为什么?

The transformed dataset mod1_features_train and mod1_features_train having the correct size both 500x733. However I cannot do the same with test_dataset_mod1 and test_dataset_mod2, why?

在调试过程中,我注意到pca的base.py文件中有一个运算X = X-self.mean,其中X是我的测试数据,self_mean是从适合火车组(slf_mean的大小为733,与X不匹配).如果我在训练过程中删除了transpose(),则pca正常运行而没有错误,则test_dataset_mod1和test_dataset_mod2的大小正确为360x500,但是,train_dataset_mod1和train_dataset_mod2的大小错误为5000x500 ??

During the debugging I noticed that the base.py file of pca, there is an operation X = X - self.mean where X is my test data and self_mean the mean calculated from the fit into the train set (the size of the slf_mean is 733 which does not match with the X). If i remove the transpose() in the training process the pca is working normally without errors, the test_dataset_mod1 and test_dataset_mod2 having correct size 360x500, however, the train_dataset_mod1 and train_dataset_mod2 having wrong sizes 5000x500???

推荐答案

您不应该在fit函数中转置矩阵,或者如果必须这样做,则必须在transform函数中转置矩阵:

you shouldn't have transpose your matrix in in fit function or if you have to , you have to transpose your matrix in the transform function :

pca.fit(train_dataset_mod1) pca2.fit(train_dataset_mod2) mod1_features_test = pca.transform(test_dataset_mod1) mod2_features_test = pca2.transform(test_dataset_mod2)

或:

pca.fit(train_dataset_mod1.transpose()) pca2.fit(train_dataset_mod2.transpose()) mod1_features_test = pca.transform(test_dataset_mod1.transpose()) mod2_features_test = pca2.transform(test_dataset_mod2.transpose())

更多推荐

将pca应用于测试数据

本文发布于:2023-08-05 20:41:02,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1307945.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:应用于   测试数据   pca

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!