我在工作中使用CCA,并且想了解一些东西.
I am using CCA for my work and want to understand something.
这是我的MATLAB代码.我仅抽取了 100 个样本来更好地理解CCA的概念.
This is my MATLAB code. I have only taken 100 samples to better understand the concepts of CCA.
clc;clear all;close all; load carbig; data = [Displacement Horsepower Weight Acceleration MPG]; data(isnan(data))=0; X = data(1:100,1:3); Y = data(1:100,4:5); [wx,wy,~,U,V] = CCA(X,Y); clear Acceleration Cylinders Displacement Horsepower MPG Mfg Model Model_Year Origin Weight when org subplot(1,2,1),plot(U(:,1),V(:,1),'.'); subplot(1,2,2),plot(U(:,2),V(:,2),'.');我的情节是这样的:
这指出,在第一个图中(左),变换后的变量高度相关,围绕中心轴的散布很小.在第二图中(右),围绕中心轴的散射更多.
This points out that in the 1st figure (left), the transformed variables are highly correlated with little scatter around the central axis. While in the 2nd figure(right), the scatter around the central axis is much more.
据我从此处,CCA可使转换后的空间中的数据之间的相关性最大化.因此,我尝试设计一个匹配分数,如果向量具有最大的相关性,则该分数应返回最小值.我尝试将U(i,:)的每个向量与i,j的每个向量匹配,从 1 到 100 .
As I understand from here that CCA maximizes the correlation between the data in the transformed space. So I tried to design a matching score which should return a minimum value if the vectors are maximally correlated. I tried to match each vector of U(i,:) with that of V(j,:) with i,j going from 1 to 100.
%% Finding the difference between the projected vectors for i=1:size(U,1) cost = repmat(U(i,:),size(U,1),1)- V; for j=1:size(U,1) c(i,j) = norm(cost(j,:),size(U,2)); end [~,idx(i)] = min(c(i,:)); end理想的idx应该是这样的:
Ideally idx should be like this :
idx = 1 2 3 4 5 6 7 8 9 10 ....,因为它们是最大相关的.但是我的输出是这样的:
as they are maximally correlated. However my output comes something like this :
idx = 80 5 3 1 4 7 17 17 17 10 68 78 78 75 9 10 5 1 6 17 .....我不明白为什么会这样.
I dont understand why this happens.
谢谢.
推荐答案首先,让我在R2014b中转置您的代码:
First, Let me transpose your code in R2014b:
load carbig; data = [Displacement Horsepower Weight Acceleration MPG]; % Truncate the data, to follow-up with your sample code data = data(1:100,:); nans = sum(isnan(data),2) > 0; [wx, wy, r, U, V,] = canoncorr(X(~nans,1:3),X(~nans,4:5));好的,现在的诀窍是,在CCA子空间中最大相关的向量是列向量U(:,1)和V(:,1)和U(:,2)和V(:,2),而不是您尝试计算的 row 向量U(i,:).在CCA子空间中,向量应为N维(此处为N=100),而不是简单的2D向量.这就是为什么CCA结果的可视化通常非常复杂的原因!
OK, now the trick is that the vectors which are maximally correlated in the CCA subspace are the column vectors U(:,1) with V(:,1) and U(:,2) with V(:,2), and not the row vectors U(i,:), as you are trying to compute. In the CCA subspace, vectors should be N-dimensional (here N=100), and not simple 2D vectors. That's the reason why visualization of CCA results is often quite complicated !
顺便说一下,相关性是由canoncorr的第三个输出给出的,您(有意使用?)选择跳过代码.如果检查其内容,您会发现相关性(向量的即)是有序的:
By the way, the correlations are given by the third output of canoncorr, that you (intentionally ?) choosed to skip in your code. If you check its content, you'll see that the correlations (i.e. the vectors) are well-ordered:
r = 0.9484 0.5991要比您已经提供的链接更好地解释CCA.如果您想走的更远,您可能应该投资一本书,例如这一个或这一个.
It is hard to explain CCA better than the link you already provided. If you want to go further, you should probably invest in a book, like this one or this one.
更多推荐
了解CCA(Matlab实施)2
发布评论