矩阵求导笔记

编程入门 行业动态 更新时间:2024-10-09 22:16:07

矩阵<a href=https://www.elefans.com/category/jswz/34/1767565.html style=求导笔记"/>

矩阵求导笔记

文章目录

  • 0.符号定义
  • 1.对标量求导
    • 1.1标量对标量求导
    • 1.2向量对标量求导
    • 1.3矩阵对标量求导
  • 2.对向量求导
    • 2.1标量对向量求导
    • 2.2向量对向量求导
    • 2.3矩阵对向量求导
  • 3.对矩阵求导
    • 3.1标量对矩阵求导
    • 3.2向量对矩阵求导
    • 3.3矩阵对矩阵求导
  • 4.行列式对矩阵求导
  • 5.迹对矩阵求导
  • 6.例题
  • 参考文献

0.符号定义

  • 数域:记 F \mathbb{F} F为某一数域。
  • 标量:记 y y y和 x x x为标量,相应的 d y \mathrm{d}y dy和 d x \mathrm{d}x dx也为标量,即 x , d x , y , d y ∈ F 1 x,\mathrm{d}x, y, \mathrm{d}y \in \mathbb{F}^{1} x,dx,y,dy∈F1。
  • 向量:记 y ⃗ \vec{y} y ​和 x ⃗ \vec{x} x 分别为 m m m和 n n n维列向量,相应的 d y ⃗ \mathrm{d}\vec{y} dy ​和 d x ⃗ \mathrm{d}\vec{x} dx 也分别为 m m m和 n n n维列向量
    即 x ⃗ , d x ⃗ , ∈ F n \vec{x},\mathrm{d}\vec{x}, \in \mathbb{F}^{n} x ,dx ,∈Fn和 y ⃗ , d y ⃗ , ∈ F m \vec{y},\mathrm{d}\vec{y}, \in \mathbb{F}^{m} y ​,dy ​,∈Fm。
  • 矩阵:记 Y Y Y和 X X X为矩阵,相应的 d y \mathrm{d}y dy和 d x \mathrm{d}x dx也为矩阵
    即 X , d X , ∈ F r × s X,\mathrm{d}X, \in \mathbb{F}^{r \times s} X,dX,∈Fr×s和 Y , d Y , ∈ F p × q Y,\mathrm{d}Y, \in \mathbb{F}^{p \times q} Y,dY,∈Fp×q。

其中 d x ⃗ = ( d x 1 , d x 2 , ⋯ , d x n ) T \mathrm{d}\vec{x} = ( \mathrm{d}x_1, \mathrm{d}x_2, \cdots, \mathrm{d}x_n )^T dx =(dx1​,dx2​,⋯,dxn​)T, d y ⃗ \mathrm{d}\vec{y} dy ​同理。
d X = ( d x ⃗ 1 , d x ⃗ 2 , ⋯ , d x ⃗ s ) = ( d x 11 d x 12 d x 13 ⋯ d x 1 s d x 21 d x 22 d x 23 ⋯ d x 2 s d x 31 d x 32 d x 33 ⋯ d x 3 s ⋮ ⋮ ⋮ ⋱ ⋮ d x r 1 d x r 2 d x r 3 ⋯ d x r s ) r × s \mathrm{d}X = \left( \begin{matrix} \mathrm{d}\vec{x}_1, & \mathrm{d}\vec{x}_2, & \cdots, & \mathrm{d}\vec{x}_s \end{matrix} \right) = \left( \begin{matrix} \mathrm{d}x_{11} & \mathrm{d}x_{12} & \mathrm{d}x_{13} & \cdots & \mathrm{d}x_{1s} \\ \mathrm{d}x_{21} & \mathrm{d}x_{22} & \mathrm{d}x_{23} & \cdots & \mathrm{d}x_{2s} \\ \mathrm{d}x_{31} & \mathrm{d}x_{32} & \mathrm{d}x_{33} & \cdots & \mathrm{d}x_{3s} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \mathrm{d}x_{r1} & \mathrm{d}x_{r2} & \mathrm{d}x_{r3} & \cdots & \mathrm{d}x_{rs} \\ \end{matrix} \right)_{r \times s} dX=(dx 1​,​dx 2​,​⋯,​dx s​​)=⎝⎜⎜⎜⎜⎜⎛​dx11​dx21​dx31​⋮dxr1​​dx12​dx22​dx32​⋮dxr2​​dx13​dx23​dx33​⋮dxr3​​⋯⋯⋯⋱⋯​dx1s​dx2s​dx3s​⋮dxrs​​⎠⎟⎟⎟⎟⎟⎞​r×s​
矩阵求导类型[1]

标量向量矩阵
标量 ∂ y ∂ x \frac{\partial y}{\partial x} ∂x∂y​ ∂ y ⃗ ∂ x \frac{\partial \vec{y}}{\partial x} ∂x∂y ​​
向量 ∂ y ∂ x ⃗ \frac{\partial y}{\partial \vec{x}} ∂x ∂y​ ∂ y ⃗ ∂ x ⃗ \frac{\partial \vec{y}}{\partial \vec{x}} ∂x ∂y ​​
矩阵 ∂ y ∂ X \frac{\partial y}{\partial X} ∂X∂y​ ∂ y ⃗ ∂ X \frac{\partial \vec{y}}{\partial X} ∂X∂y ​​

1.对标量求导

1.1标量对标量求导

为了全文书写上风格统一:
d y = ∂ y ∂ x d x . \mathrm{d}y = \frac{\partial y}{\partial x} \mathrm{d}x. dy=∂x∂y​dx.
性质

  • SS1(线性): ∂ ( u + v ) ∂ x = ∂ u ∂ x + ∂ v ∂ x \frac{\partial (u + v)}{\partial x} = \frac{\partial u}{\partial x} + \frac{\partial v}{\partial x} ∂x∂(u+v)​=∂x∂u​+∂x∂v​。
  • SS2(分部): ∂ ( u v ) ∂ x = u ∂ v ∂ x + v ∂ u ∂ x \frac{\partial (uv)}{\partial x} = u \frac{\partial v}{\partial x} + v \frac{\partial u}{\partial x} ∂x∂(uv)​=u∂x∂v​+v∂x∂u​。
  • SS3(链式): ∂ g ( u ) ∂ x = ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial g(u)}{\partial x} = \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial x} ∂x∂g(u)​=∂u∂g(u)​∂x∂u​。

1.2向量对标量求导

∂ y ⃗ ∂ x = ( ∂ y 1 ∂ x ∂ y 2 ∂ x ∂ y 3 ∂ x ⋮ ∂ y n ∂ x ) n × 1 \frac{\partial \vec{y}}{\partial x} = \left( \begin{matrix} \frac{\partial y_1}{\partial x} \\ \frac{\partial y_2}{\partial x} \\ \frac{\partial y_3}{\partial x} \\ \vdots \\ \frac{\partial y_n}{\partial x} \\ \end{matrix} \right)_{n \times 1} ∂x∂y ​​=⎝⎜⎜⎜⎜⎜⎛​∂x∂y1​​∂x∂y2​​∂x∂y3​​⋮∂x∂yn​​​⎠⎟⎟⎟⎟⎟⎞​n×1​

因此有 d y i = ∂ y i ∂ x d x , i = 1 , 2 , ⋯ , m \mathrm{d}y_i = \frac{\partial y_i}{\partial x} \mathrm{d}x, i = 1,2,\cdots, m dyi​=∂x∂yi​​dx,i=1,2,⋯,m。即
d y ⃗ = ( ∂ y 1 ∂ x d x ∂ y 2 ∂ x d x ∂ y 3 ∂ x d x ⋮ ∂ y n ∂ x d x ) n × 1 = ∂ y ⃗ ∂ x ⊗ d x \mathrm{d}\vec{y} = \left( \begin{matrix} \frac{\partial y_1}{\partial x} \mathrm{d}x \\ \frac{\partial y_2}{\partial x} \mathrm{d}x \\ \frac{\partial y_3}{\partial x} \mathrm{d}x \\ \vdots \\ \frac{\partial y_n}{\partial x} \mathrm{d}x \\ \end{matrix} \right)_{n \times 1} = \frac{\partial \vec{y}}{\partial x} \otimes \mathrm{d}x dy ​=⎝⎜⎜⎜⎜⎜⎛​∂x∂y1​​dx∂x∂y2​​dx∂x∂y3​​dx⋮∂x∂yn​​dx​⎠⎟⎟⎟⎟⎟⎞​n×1​=∂x∂y ​​⊗dx
性质

  • VS1(常向量):对于 ∀ a ⃗ ∈ F n × 1 \forall \vec{a} \in \mathbb{F}^{n \times 1} ∀a ∈Fn×1的常列向量, ∂ a ⃗ ∂ x = 0 ⃗ n × 1 \frac{\partial \vec{a}}{\partial x} = \vec{0}_{n \times 1} ∂x∂a ​=0 n×1​
  • VS2(向量数乘):对于 ∀ u ⃗ ( x ) ∈ F n × 1 , a ∈ F 1 \forall \vec{u}(x) \in \mathbb{F}^{n \times 1}, a \in \mathbb{F}^{1} ∀u (x)∈Fn×1,a∈F1, 有 ∂ a u ⃗ ∂ x = a ∂ u ⃗ ∂ x \frac{\partial a\vec{u}}{\partial x} = a \frac{\partial \vec{u}}{\partial x} ∂x∂au ​=a∂x∂u ​。
  • VS3(向量矩阵乘):对于 ∀ u ⃗ ( x ) ∈ F n × 1 , A ∈ F m × n \forall \vec{u}(x) \in \mathbb{F}^{n \times 1}, A \in \mathbb{F}^{m \times n} ∀u (x)∈Fn×1,A∈Fm×n, 有 ∂ A u ⃗ ∂ x = A ∂ u ⃗ ∂ x \frac{\partial A\vec{u}}{\partial x} = A \frac{\partial \vec{u}}{\partial x} ∂x∂Au ​=A∂x∂u ​。
  • VS4(向量转置):对于 ∀ u ⃗ ( x ) ∈ F n × 1 \forall \vec{u}(x) \in \mathbb{F}^{n \times 1} ∀u (x)∈Fn×1, 有 ∂ ( u ⃗ T ) ∂ x = ( ∂ u ⃗ ∂ x ) T \frac{\partial ( \vec{u}^T ) }{\partial x} = \left( \frac{\partial \vec{u}}{\partial x} \right)^T ∂x∂(u T)​=(∂x∂u ​)T。
  • VS5(向量加法): 对于 ∀ u ⃗ ( x ) , v ⃗ ( x ) ∈ F n × 1 \forall \vec{u}(x), \vec{v}(x) \in \mathbb{F}^{n \times 1} ∀u (x),v (x)∈Fn×1, 有 ∂ ( u ⃗ + v ⃗ ) ∂ x = ∂ u ⃗ ∂ x + ∂ v ⃗ ∂ x \frac{\partial (\vec{u} + \vec{v})}{\partial x} = \frac{\partial \vec{u}}{\partial x} + \frac{\partial \vec{v}}{\partial x} ∂x∂(u +v )​=∂x∂u ​+∂x∂v ​。
  • VS6(链式): 对于 ∀ u ⃗ ( x ) ∈ F n × 1 , g ( u ⃗ ) ⃗ ∈ F m × 1 \forall \vec{u}(x) \in \mathbb{F}^{n \times 1} , \vec{g(\vec{u})} \in \mathbb{F}^{m \times 1} ∀u (x)∈Fn×1,g(u ) ​∈Fm×1, 有 ∂ g ⃗ ∂ x = ( ∂ g ⃗ ∂ u ⃗ ) T ∂ u ⃗ ∂ x \frac{\partial \vec{g} }{\partial x} = \left( \frac{\partial \vec{g}}{\partial \vec{u}} \right)^T \frac{\partial \vec{u}}{\partial x} ∂x∂g ​​=(∂u ∂g ​​)T∂x∂u ​。

简单对 VS3(向量矩阵乘) 性质做证明。
∂ ( A u ⃗ ) ∂ x = ∂ ∂ x ( ( a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m n ) ( u 1 u 2 ⋮ u n ) ) = ∂ ∂ x ( u 1 ( a 11 a 21 ⋮ a n 1 ) + u 2 ( a 12 a 22 ⋮ a n 2 ) + ⋯ + u m ( a 1 m a 2 m ⋮ a n m ) ) = ∂ u 1 ∂ x ( a 11 a 21 ⋮ a n 1 ) + ∂ u 2 ∂ x ( a 12 a 22 ⋮ a n 2 ) + ⋯ + ∂ u m ∂ x ( a 1 m a 2 m ⋮ a n m ) = ( a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m n ) ( ∂ u 1 ∂ x ∂ u 2 ∂ x ⋮ ∂ u n ∂ x ) = A ∂ u ⃗ ∂ x \begin{aligned} \frac{\partial \left( A\vec{u} \right)}{\partial x} = & \frac{\partial}{\partial x} \left( \left( \begin{matrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{matrix} \right) \left( \begin{matrix} u_1 \\ u_2 \\ \vdots \\ u_n \\ \end{matrix} \right) \right) \\ = & \frac{\partial}{\partial x} \left( u_1 \left( \begin{matrix} a_{11} \\ a_{21} \\ \vdots \\ a_{n1} \\ \end{matrix} \right) + u_2 \left( \begin{matrix} a_{12} \\ a_{22} \\ \vdots \\ a_{n2} \\ \end{matrix} \right) + \cdots + u_m \left( \begin{matrix} a_{1m} \\ a_{2m} \\ \vdots \\ a_{nm} \\ \end{matrix} \right) \right) \\ = & \frac{\partial u_1}{\partial x} \left( \begin{matrix} a_{11} \\ a_{21} \\ \vdots \\ a_{n1} \\ \end{matrix} \right) + \frac{\partial u_2}{\partial x} \left( \begin{matrix} a_{12} \\ a_{22} \\ \vdots \\ a_{n2} \\ \end{matrix} \right) + \cdots + \frac{\partial u_m}{\partial x} \left( \begin{matrix} a_{1m} \\ a_{2m} \\ \vdots \\ a_{nm} \\ \end{matrix} \right) \\ = & \left( \begin{matrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{matrix} \right) \left( \begin{matrix} \frac{\partial u_1}{\partial x} \\ \frac{\partial u_2}{\partial x} \\ \vdots \\ \frac{\partial u_n}{\partial x} \\ \end{matrix} \right) \\ = & A \frac{\partial \vec{u} }{\partial x} \end{aligned} ∂x∂(Au )​=====​∂x∂​⎝⎜⎜⎜⎛​⎝⎜⎜⎜⎛​a11​a21​⋮am1​​a12​a22​⋮am2​​⋯⋯⋱⋯​a1n​a2n​⋮amn​​⎠⎟⎟⎟⎞​⎝⎜⎜⎜⎛​u1​u2​⋮un​​⎠⎟⎟⎟⎞​⎠⎟⎟⎟⎞​∂x∂​⎝⎜⎜⎜⎛​u1​⎝⎜⎜⎜⎛​a11​a21​⋮an1​​⎠⎟⎟⎟⎞​+u2​⎝⎜⎜⎜⎛​a12​a22​⋮an2​​⎠⎟⎟⎟⎞​+⋯+um​⎝⎜⎜⎜⎛​a1m​a2m​⋮anm​​⎠⎟⎟⎟⎞​⎠⎟⎟⎟⎞​∂x∂u1​​⎝⎜⎜⎜⎛​a11​a21​⋮an1​​⎠⎟⎟⎟⎞​+∂x∂u2​​⎝⎜⎜⎜⎛​a12​a22​⋮an2​​⎠⎟⎟⎟⎞​+⋯+∂x∂um​​⎝⎜⎜⎜⎛​a1m​a2m​⋮anm​​⎠⎟⎟⎟⎞​⎝⎜⎜⎜⎛​a11​a21​⋮am1​​a12​a22​⋮am2​​⋯⋯⋱⋯​a1n​a2n​⋮amn​​⎠⎟⎟⎟⎞​⎝⎜⎜⎜⎛​∂x∂u1​​∂x∂u2​​⋮∂x∂un​​​⎠⎟⎟⎟⎞​A∂x∂u ​​

下面简单证明 VS6(链式) 法则。首先,对于 ∂ g ⃗ ∂ u ⃗ \frac{\partial \vec{g}}{\partial \vec{u}} ∂u ∂g ​​属于向量对向量求导,有
∂ g ⃗ ∂ u ⃗ = ( ∂ g 1 ∂ u ⃗ , ∂ g 2 ∂ u ⃗ , ∂ g 3 ∂ u ⃗ , ⋯ , ∂ g m ∂ u ⃗ ) = ( ∂ g 1 ∂ u 1 ∂ g 2 ∂ u 1 ∂ g 3 ∂ x 1 ⋯ ∂ g m ∂ u 1 ∂ g 1 ∂ u 2 ∂ g 2 ∂ u 2 ∂ g 3 ∂ x 2 ⋯ ∂ g m ∂ u 2 ∂ g 1 ∂ u 3 ∂ g 2 ∂ u 3 ∂ g 3 ∂ x 3 ⋯ ∂ g m ∂ u 3 ⋮ ⋮ ⋮ ⋱ ⋮ ∂ g 1 ∂ u n ∂ g 2 ∂ u n ∂ g 3 ∂ u n ⋯ ∂ g m ∂ u n ) n × m \frac{\partial \vec{g}}{\partial \vec{u}} = \left( \begin{matrix} \frac{\partial g_1}{\partial \vec{u}}, & \frac{\partial g_2}{\partial \vec{u}}, & \frac{\partial g_3}{\partial \vec{u}}, & \cdots, & \frac{\partial g_m}{\partial \vec{u}} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial g_1}{\partial u_1} & \frac{\partial g_2}{\partial u_1} & \frac{\partial g_3}{\partial x_1} & \cdots & \frac{\partial g_m}{\partial u_1} \\ \frac{\partial g_1}{\partial u_2} & \frac{\partial g_2}{\partial u_2} & \frac{\partial g_3}{\partial x_2} & \cdots & \frac{\partial g_m}{\partial u_2} \\ \frac{\partial g_1}{\partial u_3} & \frac{\partial g_2}{\partial u_3} & \frac{\partial g_3}{\partial x_3} & \cdots & \frac{\partial g_m}{\partial u_3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial g_1}{\partial u_n} & \frac{\partial g_2}{\partial u_n} & \frac{\partial g_3}{\partial u_n} & \cdots & \frac{\partial g_m}{\partial u_n} \\ \end{matrix} \right)_{n \times m} ∂u ∂g ​​=(∂u ∂g1​​,​∂u ∂g2​​,​∂u ∂g3​​,​⋯,​∂u ∂gm​​​)=⎝⎜⎜⎜⎜⎜⎜⎛​∂u1​∂g1​​∂u2​∂g1​​∂u3​∂g1​​⋮∂un​∂g1​​​∂u1​∂g2​​∂u2​∂g2​​∂u3​∂g2​​⋮∂un​∂g2​​​∂x1​∂g3​​∂x2​∂g3​​∂x3​∂g3​​⋮∂un​∂g3​​​⋯⋯⋯⋱⋯​∂u1​∂gm​​∂u2​∂gm​​∂u3​∂gm​​⋮∂un​∂gm​​​⎠⎟⎟⎟⎟⎟⎟⎞​n×m​
∂ u ⃗ ∂ x \frac{\partial \vec{u}}{\partial x} ∂x∂u ​属于向量对标量求导,有
∂ u ⃗ ∂ x = ( ∂ u 1 ∂ x ∂ u 2 ∂ x ∂ u 3 ∂ x ⋮ ∂ u n ∂ x ) n × 1 \frac{\partial \vec{u}}{\partial x} = \left( \begin{matrix} \frac{\partial u_1}{\partial x} \\ \frac{\partial u_2}{\partial x} \\ \frac{\partial u_3}{\partial x} \\ \vdots \\ \frac{\partial u_n}{\partial x} \\ \end{matrix} \right)_{n \times 1} ∂x∂u ​=⎝⎜⎜⎜⎜⎜⎛​∂x∂u1​​∂x∂u2​​∂x∂u3​​⋮∂x∂un​​​⎠⎟⎟⎟⎟⎟⎞​n×1​
因此
R H S = ( ∂ g ⃗ ∂ u ⃗ ) T ∂ u ⃗ ∂ x = ( ∂ g 1 ∂ u 1 ∂ g 1 ∂ u 2 ∂ g 1 ∂ x 3 ⋯ ∂ g 1 ∂ u m ∂ g 2 ∂ u 1 ∂ g 2 ∂ u 2 ∂ g 2 ∂ x 3 ⋯ ∂ g 2 ∂ u m ∂ g 3 ∂ u 1 ∂ g 3 ∂ u 2 ∂ g 3 ∂ x 3 ⋯ ∂ g 3 ∂ u m ⋮ ⋮ ⋮ ⋱ ⋮ ∂ g n ∂ u 1 ∂ g n ∂ u 2 ∂ g n ∂ u 3 ⋯ ∂ g m ∂ u n ) m × n ( ∂ u 1 ∂ x ∂ u 2 ∂ x ∂ u 3 ∂ x ⋮ ∂ u n ∂ x ) n × 1 = ( ∑ i = 1 n ∂ g 1 ∂ u i ∂ u i ∂ x ∑ i = 1 n ∂ g 2 ∂ u i ∂ u i ∂ x ∑ i = 1 n ∂ g 3 ∂ u i ∂ u i ∂ x ⋮ ∑ i = 1 n ∂ g m ∂ u i ∂ u i ∂ x ) m × 1 \begin{aligned} RHS = \left( \frac{\partial \vec{g}}{\partial \vec{u}} \right)^T \frac{\partial \vec{u}}{\partial x} = & \left( \begin{matrix} \frac{\partial g_1}{\partial u_1} & \frac{\partial g_1}{\partial u_2} & \frac{\partial g_1}{\partial x_3} & \cdots & \frac{\partial g_1}{\partial u_m} \\ \frac{\partial g_2}{\partial u_1} & \frac{\partial g_2}{\partial u_2} & \frac{\partial g_2}{\partial x_3} & \cdots & \frac{\partial g_2}{\partial u_m} \\ \frac{\partial g_3}{\partial u_1} & \frac{\partial g_3}{\partial u_2} & \frac{\partial g_3}{\partial x_3} & \cdots & \frac{\partial g_3}{\partial u_m} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial g_n}{\partial u_1} & \frac{\partial g_n}{\partial u_2} & \frac{\partial g_n}{\partial u_3} & \cdots & \frac{\partial g_m}{\partial u_n} \\ \end{matrix} \right)_{m \times n} \left( \begin{matrix} \frac{\partial u_1}{\partial x} \\ \frac{\partial u_2}{\partial x} \\ \frac{\partial u_3}{\partial x} \\ \vdots \\ \frac{\partial u_n}{\partial x} \\ \end{matrix} \right)_{n \times 1} \\ = & \left( \begin{matrix} \sum_{i=1}^{n} \frac{\partial g_1}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \sum_{i=1}^{n} \frac{\partial g_2}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \sum_{i=1}^{n} \frac{\partial g_3}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \vdots \\ \sum_{i=1}^{n} \frac{\partial g_m}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \end{matrix} \right)_{m \times 1} \end{aligned} RHS=(∂u ∂g ​​)T∂x∂u ​==​⎝⎜⎜⎜⎜⎜⎜⎛​∂u1​∂g1​​∂u1​∂g2​​∂u1​∂g3​​⋮∂u1​∂gn​​​∂u2​∂g1​​∂u2​∂g2​​∂u2​∂g3​​⋮∂u2​∂gn​​​∂x3​∂g1​​∂x3​∂g2​​∂x3​∂g3​​⋮∂u3​∂gn​​​⋯⋯⋯⋱⋯​∂um​∂g1​​∂um​∂g2​​∂um​∂g3​​⋮∂un​∂gm​​​⎠⎟⎟⎟⎟⎟⎟⎞​m×n​⎝⎜⎜⎜⎜⎜⎛​∂x∂u1​​∂x∂u2​​∂x∂u3​​⋮∂x∂un​​​⎠⎟⎟⎟⎟⎟⎞​n×1​⎝⎜⎜⎜⎜⎜⎜⎛​∑i=1n​∂ui​∂g1​​∂x∂ui​​∑i=1n​∂ui​∂g2​​∂x∂ui​​∑i=1n​∂ui​∂g3​​∂x∂ui​​⋮∑i=1n​∂ui​∂gm​​∂x∂ui​​​⎠⎟⎟⎟⎟⎟⎟⎞​m×1​​

如果将 ∂ g ⃗ ∂ x \frac{\partial \vec{g} }{\partial x} ∂x∂g ​​看成向量对标量求导,则
L H S = ∂ g ⃗ ∂ x = ( ∂ g 1 ∂ x ∂ g 2 ∂ x ∂ g 3 ∂ x ⋮ ∂ g m ∂ x ) m × 1 = ( ∑ i = 1 n ∂ g 1 ∂ u i ∂ u i ∂ x ∑ i = 1 n ∂ g 2 ∂ u i ∂ u i ∂ x ∑ i = 1 n ∂ g 3 ∂ u i ∂ u i ∂ x ⋮ ∑ i = 1 n ∂ g m ∂ u i ∂ u i ∂ x ) m × 1 = R H S . LHS = \frac{\partial \vec{g} }{\partial x} =\left( \begin{matrix} \frac{\partial g_1}{\partial x} \\ \frac{\partial g_2}{\partial x} \\ \frac{\partial g_3}{\partial x} \\ \vdots \\ \frac{\partial g_m}{\partial x} \\ \end{matrix} \right)_{m \times 1} =\left( \begin{matrix} \sum_{i=1}^{n} \frac{\partial g_1}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \sum_{i=1}^{n} \frac{\partial g_2}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \sum_{i=1}^{n} \frac{\partial g_3}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \vdots \\ \sum_{i=1}^{n} \frac{\partial g_m}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \end{matrix} \right)_{m \times 1} = RHS. % \qed LHS=∂x∂g ​​=⎝⎜⎜⎜⎜⎜⎛​∂x∂g1​​∂x∂g2​​∂x∂g3​​⋮∂x∂gm​​​⎠⎟⎟⎟⎟⎟⎞​m×1​=⎝⎜⎜⎜⎜⎜⎜⎛​∑i=1n​∂ui​∂g1​​∂x∂ui​​∑i=1n​∂ui​∂g2​​∂x∂ui​​∑i=1n​∂ui​∂g3​​∂x∂ui​​⋮∑i=1n​∂ui​∂gm​​∂x∂ui​​​⎠⎟⎟⎟⎟⎟⎟⎞​m×1​=RHS.

1.3矩阵对标量求导

∂ Y ∂ x = ( ∂ y ⃗ 1 ∂ x , ∂ y ⃗ 2 ∂ x , ⋯ , ∂ y ⃗ q ∂ x ) = ( ∂ y 11 ∂ x ∂ y 12 ∂ x ∂ y 13 ∂ x ⋯ ∂ y 1 q ∂ x ∂ y 21 ∂ x ∂ y 22 ∂ x ∂ y 23 ∂ x ⋯ ∂ y 2 q ∂ x ∂ y 31 ∂ x ∂ y 32 ∂ x ∂ y 33 ∂ x ⋯ ∂ y 3 q ∂ x ⋮ ⋮ ⋮ ⋱ ⋮ ∂ y p 1 ∂ x ∂ y p 2 ∂ x ∂ y p 3 ∂ x ⋯ ∂ y p q ∂ x ) p × q \frac{\partial Y}{\partial x} = \left( \begin{matrix} \frac{\partial \vec{y}_1}{\partial x}, & \frac{\partial \vec{y}_2}{\partial x}, & \cdots, & \frac{\partial \vec{y}_q}{\partial x} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \frac{\partial y_{13}}{\partial x} & \cdots & \frac{\partial y_{1q}}{\partial x} \\ \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \frac{\partial y_{23}}{\partial x} & \cdots & \frac{\partial y_{2q}}{\partial x} \\ \frac{\partial y_{31}}{\partial x} & \frac{\partial y_{32}}{\partial x} & \frac{\partial y_{33}}{\partial x} & \cdots & \frac{\partial y_{3q}}{\partial x} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_{p1}}{\partial x} & \frac{\partial y_{p2}}{\partial x} & \frac{\partial y_{p3}}{\partial x} & \cdots & \frac{\partial y_{pq}}{\partial x} \\ \end{matrix} \right)_{p \times q} ∂x∂Y​=(∂x∂y ​1​​,​∂x∂y ​2​​,​⋯,​∂x∂y ​q​​​)=⎝⎜⎜⎜⎜⎜⎜⎛​∂x∂y11​​∂x∂y21​​∂x∂y31​​⋮∂x∂yp1​​​∂x∂y12​​∂x∂y22​​∂x∂y32​​⋮∂x∂yp2​​​∂x∂y13​​∂x∂y23​​∂x∂y33​​⋮∂x∂yp3​​​⋯⋯⋯⋱⋯​∂x∂y1q​​∂x∂y2q​​∂x∂y3q​​⋮∂x∂ypq​​​⎠⎟⎟⎟⎟⎟⎟⎞​p×q​
因此有 d y i j = ∂ y i j ∂ x d x , i = 1 , 2 , ⋯ , p , j = 1 , 2 , ⋯ , q \mathrm{d}y_{ij} = \frac{\partial y_{ij}}{\partial x} \mathrm{d}x, i = 1,2,\cdots, p, j = 1,2,\cdots, q dyij​=∂x∂yij​​dx,i=1,2,⋯,p,j=1,2,⋯,q。即
d Y = ( ∂ y ⃗ 1 ∂ x d x , ∂ y ⃗ 2 ∂ x d x , ⋯ , ∂ y ⃗ q ∂ x d x ) = ( ∂ y 11 ∂ x d x ∂ y 12 ∂ x d x ∂ y 13 ∂ x d x ⋯ ∂ y 1 q ∂ x d x ∂ y 21 ∂ x d x ∂ y 22 ∂ x d x ∂ y 23 ∂ x d x ⋯ ∂ y 2 q ∂ x d x ∂ y 31 ∂ x d x ∂ y 32 ∂ x d x ∂ y 33 ∂ x d x ⋯ ∂ y 3 q ∂ x d x ⋮ ⋮ ⋮ ⋱ ⋮ ∂ y p 1 ∂ x d x ∂ y p 2 ∂ x d x ∂ y p 3 ∂ x d x ⋯ ∂ y p q ∂ x d x ) p × q = ∂ Y ∂ x ⊗ d x \mathrm{d}Y = \left( \begin{matrix} \frac{\partial \vec{y}_1}{\partial x} \mathrm{d}x, & \frac{\partial \vec{y}_2}{\partial x} \mathrm{d}x, & \cdots, & \frac{\partial \vec{y}_q}{\partial x} \mathrm{d}x \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y_{11}}{\partial x}\mathrm{d}x & \frac{\partial y_{12}}{\partial x}\mathrm{d}x & \frac{\partial y_{13}}{\partial x}\mathrm{d}x & \cdots & \frac{\partial y_{1q}}{\partial x}\mathrm{d}x \\ \frac{\partial y_{21}}{\partial x}\mathrm{d}x & \frac{\partial y_{22}}{\partial x}\mathrm{d}x & \frac{\partial y_{23}}{\partial x}\mathrm{d}x & \cdots & \frac{\partial y_{2q}}{\partial x}\mathrm{d}x \\ \frac{\partial y_{31}}{\partial x}\mathrm{d}x & \frac{\partial y_{32}}{\partial x}\mathrm{d}x & \frac{\partial y_{33}}{\partial x}\mathrm{d}x & \cdots & \frac{\partial y_{3q}}{\partial x}\mathrm{d}x \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_{p1}}{\partial x}\mathrm{d}x & \frac{\partial y_{p2}}{\partial x}\mathrm{d}x & \frac{\partial y_{p3}}{\partial x}\mathrm{d}x & \cdots & \frac{\partial y_{pq}}{\partial x}\mathrm{d}x \\ \end{matrix} \right)_{p \times q} = \frac{\partial Y}{\partial x} \otimes \mathrm{d}x dY=(∂x∂y ​1​​dx,​∂x∂y ​2​​dx,​⋯,​∂x∂y ​q​​dx​)=⎝⎜⎜⎜⎜⎜⎜⎛​∂x∂y11​​dx∂x∂y21​​dx∂x∂y31​​dx⋮∂x∂yp1​​dx​∂x∂y12​​dx∂x∂y22​​dx∂x∂y32​​dx⋮∂x∂yp2​​dx​∂x∂y13​​dx∂x∂y23​​dx∂x∂y33​​dx⋮∂x∂yp3​​dx​⋯⋯⋯⋱⋯​∂x∂y1q​​dx∂x∂y2q​​dx∂x∂y3q​​dx⋮∂x∂ypq​​dx​⎠⎟⎟⎟⎟⎟⎟⎞​p×q​=∂x∂Y​⊗dx
性质

  • MS1(矩阵数乘):对于 ∀ U ( x ) ∈ F m × n \forall U(x) \in \mathbb{F}^{m \times n} ∀U(x)∈Fm×n, 有 ∂ ( a U ) ∂ x = a ∂ U ∂ x \frac{\partial (aU)}{\partial x} = a \frac{\partial U}{\partial x} ∂x∂(aU)​=a∂x∂U​。
  • MS2(矩阵乘):对于 ∀ U ( x ) ∈ F m × n , A ∈ F r × m , B ∈ F n × s \forall U(x) \in \mathbb{F}^{m \times n}, A \in \mathbb{F}^{r \times m}, B \in \mathbb{F}^{n \times s} ∀U(x)∈Fm×n,A∈Fr×m,B∈Fn×s, 有 ∂ ( A U B ) ∂ x = A ∂ U ∂ x B \frac{\partial (AUB)}{\partial x} = A \frac{\partial U}{\partial x} B ∂x∂(AUB)​=A∂x∂U​B。
  • MS3(线性):对于 ∀ U ( x ) , V ( x ) ∈ F m × n \forall U(x),V(x) \in \mathbb{F}^{m \times n} ∀U(x),V(x)∈Fm×n, 有 ∂ ( U + V ) ∂ x = ∂ U ∂ x + ∂ V ∂ x \frac{\partial (U + V)}{\partial x} = \frac{\partial U}{\partial x} +\frac{\partial V}{\partial x} ∂x∂(U+V)​=∂x∂U​+∂x∂V​。
  • MS4(分部):对于 ∀ U ( x ) ∈ F m × n , V ( x ) ∈ F n × l \forall U(x) \in \mathbb{F}^{m \times n}, V(x) \in \mathbb{F}^{n \times l} ∀U(x)∈Fm×n,V(x)∈Fn×l, 有 ∂ ( U V ) ∂ x = ∂ U ∂ x V + U ∂ V ∂ x \frac{\partial (UV)}{\partial x} = \frac{\partial U}{\partial x} V + U \frac{\partial V}{\partial x} ∂x∂(UV)​=∂x∂U​V+U∂x∂V​。

先证MS4(分部)
为了书写上的方便,记 ∂ Y ∂ x = ( ∂ y i j ∂ x ) p × q \frac{\partial Y}{\partial x} = \left( \frac{\partial y_{ij}}{\partial x} \right)_{p \times q} ∂x∂Y​=(∂x∂yij​​)p×q​。
∂ ( U V ) ∂ x = ( ∑ k = 1 n ∂ ( u i k v k j ) ∂ x ) m × l = ( ∑ k = 1 n ( ∂ u i k ∂ x v k j + u i k ∂ v k j ∂ x ) ) m × l = ( ∑ k = 1 n ∂ u i k ∂ x v k j ) m × l + ( ∑ k = 1 n u i k ∂ v k j ∂ x ) m × l = ∂ U ∂ x V + U ∂ V ∂ x . \begin{aligned} \frac{\partial (UV)}{\partial x} = & \left( \sum_{k=1}^{n} \frac{\partial \left( u_{ik} v_{kj} \right)}{\partial x} \right)_{m \times l} \\ = & \left( \sum_{k=1}^{n} \left( \frac{\partial u_{ik}}{\partial x}v_{kj} + u_{ik}\frac{\partial v_{kj}}{\partial x} \right) \right)_{m \times l} \\ = & \left( \sum_{k=1}^{n} \frac{\partial u_{ik}}{\partial x}v_{kj} \right)_{m \times l} + \left( \sum_{k=1}^{n} u_{ik}\frac{\partial v_{kj}}{\partial x} \right)_{m \times l}\\ = & \frac{\partial U}{\partial x} V + U \frac{\partial V}{\partial x}. \end{aligned} ∂x∂(UV)​====​(k=1∑n​∂x∂(uik​vkj​)​)m×l​(k=1∑n​(∂x∂uik​​vkj​+uik​∂x∂vkj​​))m×l​(k=1∑n​∂x∂uik​​vkj​)m×l​+(k=1∑n​uik​∂x∂vkj​​)m×l​∂x∂U​V+U∂x∂V​.​

根据 MS4(分部) 再证 MS2(矩阵乘)
∂ ( A U B ) ∂ x = ∂ A ∂ x U B + A ( ∂ U B ∂ x ) = ∂ A ∂ x U B + A ( ∂ U ∂ x B + U ∂ B ∂ x ) = 0 U B + A ∂ U ∂ x B + A U 0 = A ∂ U ∂ x B . \begin{aligned} \frac{\partial (AUB)}{\partial x} = & \frac{\partial A}{\partial x}UB + A \left( \frac{\partial UB}{\partial x} \right) \\ = & \frac{\partial A}{\partial x}UB + A \left( \frac{\partial U}{\partial x}B + U\frac{\partial B}{\partial x} \right)\\ = & 0UB + A\frac{\partial U}{\partial x}B + AU0 \\ = & A\frac{\partial U}{\partial x}B. \end{aligned} ∂x∂(AUB)​====​∂x∂A​UB+A(∂x∂UB​)∂x∂A​UB+A(∂x∂U​B+U∂x∂B​)0UB+A∂x∂U​B+AU0A∂x∂U​B.​

2.对向量求导

2.1标量对向量求导

∂ y ∂ x ⃗ = ( ∂ y ∂ x 1 ∂ y ∂ x 2 ∂ y ∂ x 3 ⋮ ∂ y ∂ x n ) n × 1 \frac{\partial y}{\partial \vec{x}} = \left( \begin{matrix} \frac{\partial y}{\partial x_1} \\ \frac{\partial y}{\partial x_2} \\ \frac{\partial y}{\partial x_3} \\ \vdots \\ \frac{\partial y}{\partial x_n} \\ \end{matrix} \right)_{n \times 1} ∂x ∂y​=⎝⎜⎜⎜⎜⎜⎜⎛​∂x1​∂y​∂x2​∂y​∂x3​∂y​⋮∂xn​∂y​​⎠⎟⎟⎟⎟⎟⎟⎞​n×1​
上式俗称梯度。
根据全微分公式:
d y = ∑ i = 1 n ∂ y ∂ x i d x i = ( ∂ y ∂ x 1 , ∂ y ∂ x 2 , ∂ y ∂ x 3 , ⋯ , ∂ y ∂ x n ) × ( d x 1 d x 2 d x 3 ⋮ d x n ) = ( ∂ y ∂ x ⃗ ) T d x ⃗ \mathrm{d}y = \sum_{i=1}^{n} \frac{\partial y}{\partial x_i} \mathrm{d}x_i = \left( \begin{matrix} \frac{\partial y}{\partial x_1}, & \frac{\partial y}{\partial x_2}, & \frac{\partial y}{\partial x_3}, & \cdots, & \frac{\partial y}{\partial x_n} \end{matrix} \right) \times \left( \begin{matrix} \mathrm{d}x_1 \\ \mathrm{d}x_2 \\ \mathrm{d}x_3 \\ \vdots \\ \mathrm{d}x_n \end{matrix} \right) = \left( \frac{\partial y}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} dy=i=1∑n​∂xi​∂y​dxi​=(∂x1​∂y​,​∂x2​∂y​,​∂x3​∂y​,​⋯,​∂xn​∂y​​)×⎝⎜⎜⎜⎜⎜⎛​dx1​dx2​dx3​⋮dxn​​⎠⎟⎟⎟⎟⎟⎞​=(∂x ∂y​)Tdx
性质

  • SV1(数乘) :对于 ∀ u ( x ) , a ∈ F \forall u(x), a \in \mathbb{F} ∀u(x),a∈F, 有 ∂ ( a u ) ∂ x ⃗ = a ∂ u ∂ x ⃗ \frac{\partial (au)}{\partial \vec{x}} = a \frac{\partial u}{\partial \vec{x}} ∂x ∂(au)​=a∂x ∂u​。
  • SV2(线性):对于 ∀ u ( x ) , v ( x ) ∈ F \forall u(x),v(x) \in \mathbb{F} ∀u(x),v(x)∈F, 有 ∂ ( u + v ) ∂ x ⃗ = ∂ u ∂ x ⃗ + ∂ v ∂ x ⃗ \frac{\partial (u + v)}{\partial \vec{x}} = \frac{\partial u}{\partial \vec{x}} + \frac{\partial v}{\partial \vec{x}} ∂x ∂(u+v)​=∂x ∂u​+∂x ∂v​。
  • SV3(分部):对于 ∀ u ( x ) , v ( x ) ∈ F \forall u(x),v(x) \in \mathbb{F} ∀u(x),v(x)∈F, 有 ∂ ( u v ) ∂ x = ∂ u ∂ x ⃗ v + u ∂ v ∂ x ⃗ \frac{\partial (uv)}{\partial x} = \frac{\partial u}{\partial \vec{x}} v + u \frac{\partial v}{\partial \vec{x}} ∂x∂(uv)​=∂x ∂u​v+u∂x ∂v​。
  • SV4(链式):对于 ∀ u ( x ) , g ( u ) ∈ F \forall u(x),g(u) \in \mathbb{F} ∀u(x),g(u)∈F, 有 ∂ g ( u ) ∂ x ⃗ = ∂ g ( u ) ∂ u ∂ u ∂ x ⃗ \frac{\partial g(u)}{\partial \vec{x}} = \frac{\partial g(u)}{\partial u}\frac{\partial u}{\partial \vec{x}} ∂x ∂g(u)​=∂u∂g(u)​∂x ∂u​。
  • SV5:对于 ∀ u ⃗ ( x ⃗ ) , v ⃗ ( x ⃗ ) ∈ F m \forall \vec{u}(\vec{x}), \vec{v}(\vec{x}) \in \mathbb{F}^{m} ∀u (x ),v (x )∈Fm, 有 ∂ ( u ⃗ T v ⃗ ) ∂ x ⃗ = ∂ v ⃗ ∂ x ⃗ u ⃗ + ∂ u ⃗ ∂ x ⃗ v ⃗ \frac{\partial (\vec{u}^T \vec{v})}{\partial \vec{x}} = \frac{\partial \vec{v}}{\partial \vec{x}} \vec{u} + \frac{\partial \vec{u}}{\partial \vec{x}} \vec{v} ∂x ∂(u Tv )​=∂x ∂v ​u +∂x ∂u ​v
  • SV6:对于 ∀ A ∈ F m × n , u ⃗ ( x ⃗ ) ∈ F m , v ⃗ ( x ⃗ ) ∈ F n \forall A \in \mathbb{F}^{m \times n}, \vec{u}(\vec{x}) \in \mathbb{F}^{m}, \vec{v}(\vec{x}) \in \mathbb{F}^{n} ∀A∈Fm×n,u (x )∈Fm,v (x )∈Fn,有 ∂ ( u ⃗ T A v ⃗ ) ∂ x ⃗ = ∂ v ⃗ ∂ x ⃗ A T u ⃗ + ∂ u ⃗ ∂ x ⃗ A v ⃗ \frac{\partial \left( \vec{u}^T A \vec{v} \right)}{\partial \vec{x}} = \frac{\partial \vec{v}}{\partial \vec{x}} A^T \vec{u} + \frac{\partial \vec{u} }{\partial \vec{x}} A \vec{v} ∂x ∂(u TAv )​=∂x ∂v ​ATu +∂x ∂u ​Av

首先对SV5做简要证明。左右两边都是 n × 1 n \times 1 n×1向量,只需证每行相等即可。对于第 i i i行,
L H S i = ∂ u ⃗ T v ⃗ ∂ x i = ∂ ∂ x i ( ∑ j = 1 m u i v i ) = ∑ j = 1 m ( ∂ u i v i ∂ x i ) = ∑ j = 1 m ( ∂ v i ∂ x i u i + ∂ u i ∂ x i v i ) . \begin{aligned} LHS_i = & \frac{\partial \vec{u}^T \vec{v}}{\partial x_i} = \frac{\partial }{\partial x_i} \left( \sum_{j=1}^{m} u_i v_i \right) \\ = & \sum_{j=1}^{m} \left( \frac{\partial u_i v_i}{\partial x_i} \right) \\ = & \sum_{j=1}^{m} \left( \frac{\partial v_i}{\partial x_i}u_i + \frac{\partial u_i}{\partial x_i}v_i \right). \\ \end{aligned} LHSi​===​∂xi​∂u Tv ​=∂xi​∂​(j=1∑m​ui​vi​)j=1∑m​(∂xi​∂ui​vi​​)j=1∑m​(∂xi​∂vi​​ui​+∂xi​∂ui​​vi​).​

根据向量对向量求导,右边第 i i i行为
R H S i = ( ∂ v 1 ∂ x i , ∂ v 2 ∂ x i , ∂ v 3 ∂ x i , ⋯ , ∂ v m ∂ x i ) ( u 1 u 2 u 3 ⋮ u m ) + ( ∂ u 1 ∂ x i , ∂ u 2 ∂ x i , ∂ u 3 ∂ x i , ⋯ , ∂ u m ∂ x i ) ( v 1 v 2 v 3 ⋮ v m ) = ∑ j = 1 m ( ∂ v i ∂ x i u i ) + ∑ j = 1 m ( ∂ u i ∂ x i v i ) = L H S i . \begin{aligned} RHS_i = & \left( \begin{matrix} \frac{\partial v_1}{\partial x_i}, & \frac{\partial v_2}{\partial x_i}, & \frac{\partial v_3}{\partial x_i}, & \cdots, & \frac{\partial v_m}{\partial x_i} \end{matrix} \right) \left( \begin{matrix} u_1 \\ u_2 \\ u_3 \\ \vdots \\ u_m \end{matrix} \right) + \left( \begin{matrix} \frac{\partial u_1}{\partial x_i}, & \frac{\partial u_2}{\partial x_i}, & \frac{\partial u_3}{\partial x_i}, & \cdots, & \frac{\partial u_m}{\partial x_i} \end{matrix} \right) \left( \begin{matrix} v_1 \\ v_2 \\ v_3 \\ \vdots \\ v_m \end{matrix} \right) \\ = & \sum_{j=1}^{m} \left( \frac{\partial v_i}{\partial x_i}u_i \right) + \sum_{j=1}^{m} \left( \frac{\partial u_i}{\partial x_i}v_i \right) \\ = & LHS_i. \end{aligned} RHSi​===​(∂xi​∂v1​​,​∂xi​∂v2​​,​∂xi​∂v3​​,​⋯,​∂xi​∂vm​​​)⎝⎜⎜⎜⎜⎜⎛​u1​u2​u3​⋮um​​⎠⎟⎟⎟⎟⎟⎞​+(∂xi​∂u1​​,​∂xi​∂u2​​,​∂xi​∂u3​​,​⋯,​∂xi​∂um​​​)⎝⎜⎜⎜⎜⎜⎛​v1​v2​v3​⋮vm​​⎠⎟⎟⎟⎟⎟⎞​j=1∑m​(∂xi​∂vi​​ui​)+j=1∑m​(∂xi​∂ui​​vi​)LHSi​.​

关于SV6,证明如下:
∂ ( u ⃗ T A v ⃗ ) ∂ x ⃗ = S V 5 ∂ A v ⃗ ∂ x ⃗ u ⃗ + ∂ u ⃗ ∂ x ⃗ A v ⃗ = V V 3 ∂ v ⃗ ∂ x ⃗ A T u ⃗ + ∂ u ⃗ ∂ x ⃗ A v ⃗ . \frac{\partial \left( \vec{u}^T A \vec{v} \right)}{\partial \vec{x}} \overset{SV5}{=} \frac{\partial A\vec{v}}{\partial \vec{x}} \vec{u} + \frac{\partial \vec{u}}{\partial \vec{x}} A\vec{v} \overset{VV3}{=} \frac{\partial \vec{v}}{\partial \vec{x}} A^T \vec{u} + \frac{\partial \vec{u} }{\partial \vec{x}} A \vec{v}. ∂x ∂(u TAv )​=SV5∂x ∂Av ​u +∂x ∂u ​Av =VV3∂x ∂v ​ATu +∂x ∂u ​Av .

2.2向量对向量求导

∂ y i ∂ x ⃗ = ( ∂ y i ∂ x 1 ∂ y i ∂ x 2 ∂ y i ∂ x 3 ⋮ ∂ y i ∂ x n ) n × 1 , i = 1 , 2 , ⋯ , m . \frac{\partial y_i}{\partial \vec{x}} = \left( \begin{matrix} \frac{\partial y_i}{\partial x_1} \\ \frac{\partial y_i}{\partial x_2} \\ \frac{\partial y_i}{\partial x_3} \\ \vdots \\ \frac{\partial y_i}{\partial x_n} \end{matrix} \right)_{n \times 1} , i = 1, 2, \cdots, m. ∂x ∂yi​​=⎝⎜⎜⎜⎜⎜⎜⎛​∂x1​∂yi​​∂x2​∂yi​​∂x3​∂yi​​⋮∂xn​∂yi​​​⎠⎟⎟⎟⎟⎟⎟⎞​n×1​,i=1,2,⋯,m.
因此
∂ y ⃗ ∂ x ⃗ = ( ∂ y 1 ∂ x ⃗ , ∂ y 2 ∂ x ⃗ , ∂ y 3 ∂ x ⃗ , ⋯ , ∂ y m ∂ x ⃗ ) = ( ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ∂ y 3 ∂ x 1 ⋯ ∂ y m ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 2 ∂ x 2 ∂ y 3 ∂ x 2 ⋯ ∂ y m ∂ x 2 ∂ y 1 ∂ x 3 ∂ y 2 ∂ x 3 ∂ y 3 ∂ x 3 ⋯ ∂ y m ∂ x 3 ⋮ ⋮ ⋮ ⋱ ⋮ ∂ y 1 ∂ x n ∂ y 2 ∂ x n ∂ y 3 ∂ x n ⋯ ∂ y m ∂ x n ) n × m \frac{\partial \vec{y}}{\partial \vec{x}} = \left( \begin{matrix} \frac{\partial y_1}{\partial \vec{x}}, & \frac{\partial y_2}{\partial \vec{x}}, & \frac{\partial y_3}{\partial \vec{x}}, & \cdots, & \frac{\partial y_m}{\partial \vec{x}} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \frac{\partial y_3}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_1} \\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} & \frac{\partial y_3}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_2} \\ \frac{\partial y_1}{\partial x_3} & \frac{\partial y_2}{\partial x_3} & \frac{\partial y_3}{\partial x_3} & \cdots & \frac{\partial y_m}{\partial x_3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_1}{\partial x_n} & \frac{\partial y_2}{\partial x_n} & \frac{\partial y_3}{\partial x_n} & \cdots & \frac{\partial y_m}{\partial x_n} \\ \end{matrix} \right)_{n \times m} ∂x ∂y ​​=(∂x ∂y1​​,​∂x ∂y2​​,​∂x ∂y3​​,​⋯,​∂x ∂ym​​​)=⎝⎜⎜⎜⎜⎜⎜⎛​∂x1​∂y1​​∂x2​∂y1​​∂x3​∂y1​​⋮∂xn​∂y1​​​∂x1​∂y2​​∂x2​∂y2​​∂x3​∂y2​​⋮∂xn​∂y2​​​∂x1​∂y3​​∂x2​∂y3​​∂x3​∂y3​​⋮∂xn​∂y3​​​⋯⋯⋯⋱⋯​∂x1​∂ym​​∂x2​∂ym​​∂x3​∂ym​​⋮∂xn​∂ym​​​⎠⎟⎟⎟⎟⎟⎟⎞​n×m​
由上面的标量对向量求导,可知 d y i = ( ∂ y i ∂ x ⃗ ) T d x ⃗ \mathrm{d}y_i = \left( \frac{\partial y_i}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} dyi​=(∂x ∂yi​​)Tdx .因此
d y ⃗ = ( d y 1 d y 2 d y 3 ⋮ d y m ) = [ ( ∂ y 1 ∂ x ⃗ ) T d x ⃗ ( ∂ y 2 ∂ x ⃗ ) T d x ⃗ ( ∂ y 3 ∂ x ⃗ ) T d x ⃗ ⋮ ( ∂ y m ∂ x ⃗ ) T d x ⃗ ] = [ ( ∂ y 1 ∂ x ⃗ ) T ( ∂ y 2 ∂ x ⃗ ) T ( ∂ y 3 ∂ x ⃗ ) T ⋮ ( ∂ y m ∂ x ⃗ ) T ] d x ⃗ = ( ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ∂ y 3 ∂ x 1 ⋯ ∂ y m ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 2 ∂ x 2 ∂ y 3 ∂ x 2 ⋯ ∂ y m ∂ x 2 ∂ y 1 ∂ x 3 ∂ y 2 ∂ x 3 ∂ y 3 ∂ x 3 ⋯ ∂ y m ∂ x 3 ⋮ ⋮ ⋮ ⋱ ⋮ ∂ y 1 ∂ x n ∂ y 2 ∂ x n ∂ y 3 ∂ x n ⋯ ∂ y m ∂ x n ) T d x ⃗ = ( ∂ y ⃗ ∂ x ⃗ ) T d x ⃗ \mathrm{d} \vec{y} = \left( \begin{matrix} \mathrm{d}y_1 \\ \mathrm{d}y_2 \\ \mathrm{d}y_3 \\ \vdots \\ \mathrm{d}y_m \end{matrix} \right) = \left[ \begin{matrix} \left( \frac{\partial y_1}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} \\ \left( \frac{\partial y_2}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} \\ \left( \frac{\partial y_3}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} \\ \vdots \\ \left( \frac{\partial y_m}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} \end{matrix} \right] = \left[ \begin{matrix} \left( \frac{\partial y_1}{\partial \vec{x}} \right)^T \\ \left( \frac{\partial y_2}{\partial \vec{x}} \right)^T \\ \left( \frac{\partial y_3}{\partial \vec{x}} \right)^T \\ \vdots \\ \left( \frac{\partial y_m}{\partial \vec{x}} \right)^T \end{matrix} \right] \mathrm{d}\vec{x} = \left( \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \frac{\partial y_3}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_1} \\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} & \frac{\partial y_3}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_2} \\ \frac{\partial y_1}{\partial x_3} & \frac{\partial y_2}{\partial x_3} & \frac{\partial y_3}{\partial x_3} & \cdots & \frac{\partial y_m}{\partial x_3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_1}{\partial x_n} & \frac{\partial y_2}{\partial x_n} & \frac{\partial y_3}{\partial x_n} & \cdots & \frac{\partial y_m}{\partial x_n} \\ \end{matrix} \right)^{T} \mathrm{d}\vec{x} = \left( \frac{\partial \vec{y}}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} dy ​=⎝⎜⎜⎜⎜⎜⎛​dy1​dy2​dy3​⋮dym​​⎠⎟⎟⎟⎟⎟⎞​=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​(∂x ∂y1​​)Tdx (∂x ∂y2​​)Tdx (∂x ∂y3​​)Tdx ⋮(∂x ∂ym​​)Tdx ​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​(∂x ∂y1​​)T(∂x ∂y2​​)T(∂x ∂y3​​)T⋮(∂x ∂ym​​)T​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​dx =⎝⎜⎜⎜⎜⎜⎜⎛​∂x1​∂y1​​∂x2​∂y1​​∂x3​∂y1​​⋮∂xn​∂y1​​​∂x1​∂y2​​∂x2​∂y2​​∂x3​∂y2​​⋮∂xn​∂y2​​​∂x1​∂y3​​∂x2​∂y3​​∂x3​∂y3​​⋮∂xn​∂y3​​​⋯⋯⋯⋱⋯​∂x1​∂ym​​∂x2​∂ym​​∂x3​∂ym​​⋮∂xn​∂ym​​​⎠⎟⎟⎟⎟⎟⎟⎞​Tdx =(∂x ∂y ​​)Tdx

  • VV1(数乘):对于 ∀ u ⃗ ( x ⃗ ) ∈ F m , a ( x ⃗ ) ∈ F \forall \vec{u}(\vec{x}) \in \mathbb{F}^{m}, a(\vec{x}) \in \mathbb{F} ∀u (x )∈Fm,a(x )∈F, 有 ∂ ( a u ⃗ ) ∂ x ⃗ = a ∂ u ⃗ ∂ x ⃗ + ∂ a ∂ x ⃗ u ⃗ T \frac{\partial (a\vec{u})}{\partial \vec{x}} = a \frac{\partial \vec{u}}{\partial \vec{x}} + \frac{\partial a}{\partial \vec{x}} \vec{u}^T ∂x ∂(au )​=a∂x ∂u ​+∂x ∂a​u T。
  • VV2(线性):对于 ∀ u ⃗ ( x ⃗ ) , v ⃗ ( x ⃗ ) ∈ F m \forall \vec{u}(\vec{x}), \vec{v}(\vec{x}) \in \mathbb{F}^{m} ∀u (x ),v (x )∈Fm, 有 ∂ ( u ⃗ + v ⃗ ) ∂ x ⃗ = ∂ u ⃗ ∂ x ⃗ + ∂ v ⃗ ∂ x ⃗ \frac{\partial (\vec{u} + \vec{v})}{\partial \vec{x}} = \frac{\partial \vec{u}}{\partial \vec{x}} + \frac{\partial \vec{v}}{\partial \vec{x}} ∂x ∂(u +v )​=∂x ∂u ​+∂x ∂v ​。
  • VV3(乘矩阵):对于 ∀ u ⃗ ( x ⃗ ) ∈ F m , A ∈ F p × m \forall \vec{u}(\vec{x}) \in \mathbb{F}^{m}, A \in \mathbb{F}^{p \times m} ∀u (x )∈Fm,A∈Fp×m, 有 ∂ ( A u ⃗ ) ∂ x ⃗ = ∂ u ⃗ ∂ x ⃗ A T \frac{\partial (A \vec{u})}{\partial \vec{x}} = \frac{\partial \vec{u}}{\partial \vec{x}} A^T ∂x ∂(Au )​=∂x ∂u ​AT。
  • VV4(链式):对于 ∀ u ⃗ ( x ⃗ ) ∈ F p , g ⃗ ( u ⃗ ) ∈ F q \forall \vec{u}(\vec{x}) \in \mathbb{F}^{p}, \vec{g}(\vec{u}) \in \mathbb{F}^{q} ∀u (x )∈Fp,g ​(u )∈Fq, 有 ∂ g ⃗ ( u ⃗ ) ∂ x ⃗ = ∂ u ⃗ ∂ x ⃗ ∂ g ⃗ ∂ u ⃗ \frac{\partial \vec{g}(\vec{u})}{\partial \vec{x}} = \frac{\partial \vec{u}}{\partial \vec{x}} \frac{\partial \vec{g}}{\partial \vec{u}} ∂x ∂g ​(u )​=∂x ∂u ​∂u ∂g ​​。

VV1(数乘),同样为了书写方便记 ∂ u ⃗ x ⃗ = ( ∂ u j ∂ x i ) n × m \frac{\partial \vec{u}}{\vec{x}} = \left( \frac{\partial u_j}{\partial x_i} \right)_{n \times m} x ∂u ​=(∂xi​∂uj​​)n×m​。
∂ ( a u ⃗ ) ∂ x ⃗ = ( ∂ a u j ∂ x i ) n × m = ( a ∂ u j ∂ x i + ∂ a ∂ x i u j ) n × m = ( a ∂ u j ∂ x i ) n × m + ( ∂ a ∂ x i u j ) n × m = a ( ∂ u j ∂ x i ) n × m + ( ∂ a ∂ x 1 ∂ a ∂ x 2 ∂ a ∂ x 3 ⋮ ∂ a ∂ x n ) ( u 1 , u 2 , u 3 , ⋯ , u m ) = a ∂ u ⃗ ∂ x ⃗ + ∂ a ∂ x ⃗ u ⃗ T . \begin{aligned} \frac{\partial (a\vec{u})}{\partial \vec{x}} = & \left( \frac{\partial a u_j}{\partial x_i} \right)_{n \times m} \\ = & \left( a \frac{\partial u_j}{\partial x_i} + \frac{\partial a}{\partial x_i} u_j \right)_{n \times m} \\ = & \left( a \frac{\partial u_j}{\partial x_i} \right)_{n \times m} + \left( \frac{\partial a}{\partial x_i} u_j \right)_{n \times m} \\ = & a \left( \frac{\partial u_j}{\partial x_i} \right)_{n \times m} + \left( \begin{matrix} \frac{\partial a}{\partial x_1} \\ \frac{\partial a}{\partial x_2} \\ \frac{\partial a}{\partial x_3} \\ \vdots \\ \frac{\partial a}{\partial x_n} \end{matrix} \right) \left( \begin{matrix} u_1, & u_2, & u_3, & \cdots, u_m \end{matrix} \right) \\ = & a \frac{\partial \vec{u}}{\partial \vec{x}} + \frac{\partial a}{\partial \vec{x}} \vec{u}^T. \end{aligned} ∂x ∂(au )​=====​(∂xi​∂auj​​)n×m​(a∂xi​∂uj​​+∂xi​∂a​uj​)n×m​(a∂xi​∂uj​​)n×m​+(∂xi​∂a​uj​)n×m​a(∂xi​∂uj​​)n×m​+⎝⎜⎜⎜⎜⎜⎜⎛​∂x1​∂a​∂x2​∂a​∂x3​∂a​⋮∂xn​∂a​​⎠⎟⎟⎟⎟⎟⎟⎞​(u1​,​u2​,​u3​,​⋯,um​​)a∂x ∂u ​+∂x ∂a​u T.​
VV3(乘矩阵),向量 A u ⃗ A\vec{u} Au 记成 ( ∑ k = 1 m a j k u k ) p \left( \sum_{k=1}^{m} a_{jk} u_k \right)_p (∑k=1m​ajk​uk​)p​,即其第 j j j行元素为 ∑ k = 1 m a j k u k \sum_{k=1}^{m} a_{jk} u_k ∑k=1m​ajk​uk​。
根据向量对向量求导的特点,可以得到
∂ ∂ x ⃗ ( ∑ k = 1 m a j k u k ) = ( ∂ ∂ x 1 ( ∑ k = 1 m a j k u k ) ∂ ∂ x 2 ( ∑ k = 1 m a j k u k ) ∂ ∂ x 3 ( ∑ k = 1 m a j k u k ) ⋮ ∂ ∂ x n ( ∑ k = 1 m a j k u k ) ) \frac{\partial}{\partial \vec{x}} \left( \sum_{k=1}^{m} a_{jk} u_k \right) = \left( \begin{matrix} \frac{\partial}{\partial x_1} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \\ \frac{\partial}{\partial x_2} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \\ \frac{\partial}{\partial x_3} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \\ \vdots \\ \frac{\partial}{\partial x_n} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \end{matrix} \right) ∂x ∂​(k=1∑m​ajk​uk​)=⎝⎜⎜⎜⎜⎜⎜⎛​∂x1​∂​(∑k=1m​ajk​uk​)∂x2​∂​(∑k=1m​ajk​uk​)∂x3​∂​(∑k=1m​ajk​uk​)⋮∂xn​∂​(∑k=1m​ajk​uk​)​⎠⎟⎟⎟⎟⎟⎟⎞​
因此LHS可以写为
∂ ( A u ⃗ ) ∂ x ⃗ = ( ∂ ∂ x i ( ∑ k = 1 m a j k u k ) ) n × p = ( ∑ k = 1 m ( ∂ u k ∂ x i a j k ) ) n × p . \frac{\partial (A \vec{u})}{\partial \vec{x}} = \left( \frac{\partial}{\partial x_i} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \right)_{n \times p} = \left( \sum_{k=1}^{m} \left( \frac{\partial u_k}{\partial x_i} a_{jk} \right) \right)_{n \times p}. ∂x ∂(Au )​=(∂xi​∂​(k=1∑m​ajk​uk​))n×p​=(k=1∑m​(∂xi​∂uk​​ajk​))n×p​.
即 L H S i , j = ∑ k = 1 m ( ∂ u k ∂ x i a j k ) LHS_{i,j} = \sum_{k=1}^{m} \left( \frac{\partial u_k}{\partial x_i} a_{jk} \right) LHSi,j​=∑k=1m​(∂xi​∂uk​​ajk​)。
现在考虑 R H S i , j RHS_{i,j} RHSi,j​,它是由 ∂ u ⃗ ∂ x ⃗ \frac{\partial \vec{u}}{\partial \vec{x}} ∂x ∂u ​的第 i i i行乘 A T A^T AT的第 j j j列得到的。
R H S i , j = ( ∂ u 1 ∂ x i , ∂ u 2 ∂ x i , ⋯ , ∂ u m ∂ x i ) ( a j 1 a j 2 ⋮ a j m ) = ∑ k = 1 m ( ∂ u k ∂ x i a j k ) = L H S i , j . RHS_{i,j} = \left( \frac{\partial u_1}{\partial x_i}, \frac{\partial u_2}{\partial x_i}, \cdots ,\frac{\partial u_m}{\partial x_i} \right) \left( \begin{matrix} a_{j1} \\ a_{j2} \\ \vdots \\ a_{jm} \\ \end{matrix} \right) = \sum_{k=1}^{m} \left( \frac{\partial u_k}{\partial x_i} a_{jk} \right) = LHS_{i,j}. RHSi,j​=(∂xi​∂u1​​,∂xi​∂u2​​,⋯,∂xi​∂um​​)⎝⎜⎜⎜⎛​aj1​aj2​⋮ajm​​⎠⎟⎟⎟⎞​=k=1∑m​(∂xi​∂uk​​ajk​)=LHSi,j​.

最后证VV4(链式)
∂ u ⃗ ∂ x ⃗ ∂ g ⃗ ∂ u ⃗ = ( ∂ u j ∂ x i ) n × p ( ∂ g k ∂ u j ) p × q = ( ∑ j = 1 p ( ∂ u j ∂ x i ∂ g k ∂ u j ) ) n × q \frac{\partial \vec{u}}{\partial \vec{x}} \frac{\partial \vec{g}}{\partial \vec{u}} = \left( \frac{\partial u_j}{ \partial x_i} \right)_{n \times p} \left( \frac{\partial g_k}{ \partial u_j} \right)_{p \times q} = \left( \sum_{j=1}^{p} \left( \frac{\partial u_j}{ \partial x_i} \frac{\partial g_k}{ \partial u_j} \right) \right)_{n \times q} ∂x ∂u ​∂u ∂g ​​=(∂xi​∂uj​​)n×p​(∂uj​∂gk​​)p×q​=(j=1∑p​(∂xi​∂uj​​∂uj​∂gk​​))n×q​
即 R H S i , k = ∑ j = 1 p ( ∂ u j ∂ x i ∂ g k ∂ u j ) RHS_{i,k} = \sum_{j=1}^{p} \left( \frac{\partial u_j}{ \partial x_i} \frac{\partial g_k}{ \partial u_j} \right) RHSi,k​=∑j=1p​(∂xi​∂uj​​∂uj​∂gk​​)。
而 ∂ g ⃗ ( u ⃗ ) ∂ x ⃗ = ( ∂ g k ∂ x i ) n × q \frac{\partial \vec{g}(\vec{u})}{\partial \vec{x}} = \left( \frac{\partial g_k}{ \partial x_i} \right)_{n \times q} ∂x ∂g ​(u )​=(∂xi​∂gk​​)n×q​,即
L H S i , k = ∂ g k ∂ x i = S S 3 ∂ g k ∂ u 1 ∂ u 1 ∂ x i + ∂ g k ∂ u 2 ∂ u 2 ∂ x i + ⋯ + ∂ g k ∂ u p ∂ u p ∂ x i = ∑ j = 1 p ( ∂ u j ∂ x i ∂ g k ∂ u j ) = R H S i , k . \begin{aligned} LHS_{i,k} = & \frac{\partial g_k}{ \partial x_i} \\ \overset{SS3}{=} & \frac{\partial g_k}{ \partial u_1} \frac{\partial u_1}{ \partial x_i} + \frac{\partial g_k}{ \partial u_2} \frac{\partial u_2}{ \partial x_i} + \cdots + \frac{\partial g_k}{ \partial u_p} \frac{\partial u_p}{ \partial x_i} \\ = & \sum_{j=1}^{p} \left( \frac{\partial u_j}{ \partial x_i} \frac{\partial g_k}{ \partial u_j} \right) \\ = & RHS_{i,k}. \end{aligned} LHSi,k​==SS3==​∂xi​∂gk​​∂u1​∂gk​​∂xi​∂u1​​+∂u2​∂gk​​∂xi​∂u2​​+⋯+∂up​∂gk​​∂xi​∂up​​j=1∑p​(∂xi​∂uj​​∂uj​∂gk​​)RHSi,k​.​

2.3矩阵对向量求导

首先将矩阵 Y Y Y按列优先向量化,即
v e c ( Y p × q ) = v e c ( ( y 11 y 12 y 13 ⋯ y 1 q y 21 y 22 y 23 ⋯ y 2 q y 31 y 32 y 33 ⋯ y 3 q ⋮ ⋮ ⋮ ⋱ ⋮ y p 1 y p 2 y p 3 ⋯ y p q ) p × q ) = ( y ⃗ 1 , y ⃗ 2 , y ⃗ 3 , ⋯ , y ⃗ q ) T = ( y 11 y 21 ⋮ y p 1 y 12 y 22 ⋮ y p 2 ⋮ ⋮ y 1 q y 2 q ⋮ y p q ) p q × 1 . vec(Y_{p \times q}) = vec \left( \left( \begin{matrix} y_{11} & y_{12} & y_{13} & \cdots & y_{1q} \\ y_{21} & y_{22} & y_{23} & \cdots & y_{2q} \\ y_{31} & y_{32} & y_{33} & \cdots & y_{3q} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ y_{p1} & y_{p2} & y_{p3} & \cdots & y_{pq} \end{matrix} \right)_{p \times q} \right) = \left( \vec{y}_1, \vec{y}_2, \vec{y}_3, \cdots, \vec{y}_q \right)^T = \left( \begin{matrix} y_{11} \\ y_{21} \\ \vdots \\ y_{p1} \\ y_{12} \\ y_{22} \\ \vdots \\ y_{p2} \\ \vdots \\ \vdots \\ y_{1q} \\ y_{2q} \\ \vdots \\ y_{pq} \end{matrix} \right)_{pq \times 1}. vec(Yp×q​)=vec⎝⎜⎜⎜⎜⎜⎛​⎝⎜⎜⎜⎜⎜⎛​y11​y21​y31​⋮yp1​​y12​y22​y32​⋮yp2​​y13​y23​y33​⋮yp3​​⋯⋯⋯⋱⋯​y1q​y2q​y3q​⋮ypq​​⎠⎟⎟⎟⎟⎟⎞​p×q​⎠⎟⎟⎟⎟⎟⎞​=(y ​1​,y ​2​,y ​3​,⋯,y ​q​)T=⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛​y11​y21​⋮yp1​y12​y22​⋮yp2​⋮⋮y1q​y2q​⋮ypq​​⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞​pq×1​.
根据向量对向量求导,有
∂ y ⃗ i ∂ x ⃗ = ( ∂ y 1 i ∂ x ⃗ , ∂ y 2 i ∂ x ⃗ , ∂ y 3 i ∂ x ⃗ , ⋯ , ∂ y p i ∂ x ⃗ ) = ( ∂ y 1 i ∂ x 1 ∂ y 2 i ∂ x 1 ∂ y 3 i ∂ x 1 ⋯ ∂ y p i ∂ x 1 ∂ y 1 i ∂ x 2 ∂ y 2 i ∂ x 2 ∂ y 3 i ∂ x 2 ⋯ ∂ y p i ∂ x 2 ∂ y 1 i ∂ x 3 ∂ y 2 i ∂ x 3 ∂ y 3 i ∂ x 3 ⋯ ∂ y p i ∂ x 3 ⋮ ⋮ ⋮ ⋱ ⋮ ∂ y 1 i ∂ x n ∂ y 2 i ∂ x n ∂ y 3 i ∂ x n ⋯ ∂ y p i ∂ x n ) n × p \frac{\partial \vec{y}_i}{\partial \vec{x}} = \left( \begin{matrix} \frac{\partial y_{1i}}{\partial \vec{x}}, & \frac{\partial y_{2i}}{\partial \vec{x}}, & \frac{\partial y_{3i}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pi}}{\partial \vec{x}} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y_{1i}}{\partial x_1} & \frac{\partial y_{2i}}{\partial x_1} & \frac{\partial y_{3i}}{\partial x_1} & \cdots & \frac{\partial y_{pi}}{\partial x_1} \\ \frac{\partial y_{1i}}{\partial x_2} & \frac{\partial y_{2i}}{\partial x_2} & \frac{\partial y_{3i}}{\partial x_2} & \cdots & \frac{\partial y_{pi}}{\partial x_2} \\ \frac{\partial y_{1i}}{\partial x_3} & \frac{\partial y_{2i}}{\partial x_3} & \frac{\partial y_{3i}}{\partial x_3} & \cdots & \frac{\partial y_{pi}}{\partial x_3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_{1i}}{\partial x_n} & \frac{\partial y_{2i}}{\partial x_n} & \frac{\partial y_{3i}}{\partial x_n} & \cdots & \frac{\partial y_{pi}}{\partial x_n} \\ \end{matrix} \right)_{n \times p} ∂x ∂y ​i​​=(∂x ∂y1i​​,​∂x ∂y2i​​,​∂x ∂y3i​​,​⋯,​∂x ∂ypi​​​)=⎝⎜⎜⎜⎜⎜⎜⎛​∂x1​∂y1i​​∂x2​∂y1i​​∂x3​∂y1i​​⋮∂xn​∂y1i​​​∂x1​∂y2i​​∂x2​∂y2i​​∂x3​∂y2i​​⋮∂xn​∂y2i​​​∂x1​∂y3i​​∂x2​∂y3i​​∂x3​∂y3i​​⋮∂xn​∂y3i​​​⋯⋯⋯⋱⋯​∂x1​∂ypi​​∂x2​∂ypi​​∂x3​∂ypi​​⋮∂xn​∂ypi​​​⎠⎟⎟⎟⎟⎟⎟⎞​n×p​
因此
∂ v e c ( Y ) ∂ x ⃗ = ( ∂ y 11 ∂ x ⃗ , ∂ y 21 ∂ x ⃗ , ⋯ , ∂ y p 1 ∂ x ⃗ , ∂ y 22 ∂ x ⃗ , ⋯ , ∂ y p 2 ∂ x ⃗ , ⋯ , ⋯ , ∂ y p q ∂ x ⃗ ) = ( ∂ y 11 ∂ x 1 , ∂ y 21 ∂ x 1 , ⋯ , ∂ y p 1 ∂ x 1 , ∂ y 22 ∂ x 1 , ⋯ , ∂ y p 2 ∂ x 1 , ⋯ , ⋯ , ∂ y p q ∂ x 1 ∂ y 11 ∂ x 2 , ∂ y 21 ∂ x 2 , ⋯ , ∂ y p 1 ∂ x 2 , ∂ y 22 ∂ x 2 , ⋯ , ∂ y p 2 ∂ x 2 , ⋯ , ⋯ , ∂ y p q ∂ x 2 ∂ y 11 ∂ x 3 , ∂ y 21 ∂ x 3 , ⋯ , ∂ y p 1 ∂ x 3 , ∂ y 22 ∂ x 3 , ⋯ , ∂ y p 2 ∂ x 3 , ⋯ , ⋯ , ∂ y p q ∂ x 3 ⋮ ⋮ ⋱ , ⋮ ⋮ ⋱ , ⋮ ⋱ , ⋯ , ∂ y p q ∂ x 2 ∂ y 11 ∂ x n , ∂ y 21 ∂ x n , ⋯ , ∂ y p 1 ∂ x n , ∂ y 22 ∂ x n , ⋯ , ∂ y p 2 ∂ x n , ⋯ , ⋯ , ∂ y p q ∂ x n ) n × p q \begin{aligned} \frac{\partial vec(Y)}{\partial \vec{x}} = & \left( \begin{matrix} \frac{\partial y_{11}}{\partial \vec{x}}, & \frac{\partial y_{21}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{p1}}{\partial \vec{x}}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{p2}} {\partial \vec{x}}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial \vec{x}} \end{matrix} \right) \\ = & \left( \begin{matrix} \frac{\partial y_{11}}{\partial x_1}, & \frac{\partial y_{21}}{\partial x_1}, & \cdots, & \frac{\partial y_{p1}}{\partial x_1}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial x_1}, & \cdots, & \frac{\partial y_{p2}} {\partial x_1}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_1} \\ \frac{\partial y_{11}}{\partial x_2}, & \frac{\partial y_{21}}{\partial x_2}, & \cdots, & \frac{\partial y_{p1}}{\partial x_2}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial x_2}, & \cdots, & \frac{\partial y_{p2}} {\partial x_2}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_2} \\ \frac{\partial y_{11}}{\partial x_3}, & \frac{\partial y_{21}}{\partial x_3}, & \cdots, & \frac{\partial y_{p1}}{\partial x_3}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial x_3}, & \cdots, & \frac{\partial y_{p2}} {\partial x_3}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_3} \\ \vdots & \vdots & \ddots, & \vdots & % \frac{\partial y_{12}}{\partial \vec{x}}, & \vdots & \ddots, & \vdots & \ddots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_2} \\ \frac{\partial y_{11}}{\partial x_n}, & \frac{\partial y_{21}}{\partial x_n}, & \cdots, & \frac{\partial y_{p1}}{\partial x_n}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial x_n}, & \cdots, & \frac{\partial y_{p2}} {\partial x_n}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_n} \\ \end{matrix} \right)_{n \times pq} \end{aligned} ∂x ∂vec(Y)​==​(∂x ∂y11​​,​∂x ∂y21​​,​⋯,​∂x ∂yp1​​,​∂x ∂y22​​,​⋯,​∂x ∂yp2​​,​⋯,​⋯,​∂x ∂ypq​​​)⎝⎜⎜⎜⎜⎜⎜⎛​∂x1​∂y11​​,∂x2​∂y11​​,∂x3​∂y11​​,⋮∂xn​∂y11​​,​∂x1​∂y21​​,∂x2​∂y21​​,∂x3​∂y21​​,⋮∂xn​∂y21​​,​⋯,⋯,⋯,⋱,⋯,​∂x1​∂yp1​​,∂x2​∂yp1​​,∂x3​∂yp1​​,⋮∂xn​∂yp1​​,​∂x1​∂y22​​,∂x2​∂y22​​,∂x3​∂y22​​,⋮∂xn​∂y22​​,​⋯,⋯,⋯,⋱,⋯,​∂x1​∂yp2​​,∂x2​∂yp2​​,∂x3​∂yp2​​,⋮∂xn​∂yp2​​,​⋯,⋯,⋯,⋱,⋯,​⋯,⋯,⋯,⋯,⋯,​∂x1​∂ypq​​∂x2​∂ypq​​∂x3​∂ypq​​∂x2​∂ypq​​∂xn​∂ypq​​​⎠⎟⎟⎟⎟⎟⎟⎞​n×pq​​

v e c ( d Y ) = ( ∂ v e c ( Y ) ∂ x ⃗ ) T d x ⃗ vec(\mathrm{d}Y) = \left( \frac{\partial vec(Y)}{\partial \vec{x}} \right)^T \mathrm{d} \vec{x} vec(dY)=(∂x ∂vec(Y)​)Tdx

3.对矩阵求导

3.1标量对矩阵求导

∂ y ∂ X = ( ∂ y ∂ x ⃗ 1 , ∂ y ∂ x ⃗ 2 , ⋯ , ∂ y ∂ x ⃗ s ) = ( ∂ y ∂ x 11 ∂ y ∂ x 12 ∂ y ∂ x 13 ⋯ ∂ y ∂ x 1 s ∂ y ∂ x 21 ∂ y ∂ x 22 ∂ y ∂ x 23 ⋯ ∂ y ∂ x 2 s ∂ y ∂ x 31 ∂ y ∂ x 32 ∂ y ∂ x 33 ⋯ ∂ y ∂ x 3 s ⋮ ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x r 1 ∂ y ∂ x r 2 ∂ y ∂ x r 3 ⋯ ∂ y ∂ x r s ) r × s \frac{\partial y}{\partial X} = \left( \begin{matrix} \frac{\partial y}{\partial \vec{x}_1}, & \frac{\partial y}{\partial \vec{x}_2}, & \cdots, & \frac{\partial y}{\partial \vec{x}_s} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{13}} & \cdots & \frac{\partial y}{\partial x_{1s}} \\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \frac{\partial y}{\partial x_{23}} & \cdots & \frac{\partial y}{\partial x_{2s}} \\ \frac{\partial y}{\partial x_{31}} & \frac{\partial y}{\partial x_{32}} & \frac{\partial y}{\partial x_{33}} & \cdots & \frac{\partial y}{\partial x_{3s}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y}{\partial x_{r1}} & \frac{\partial y}{\partial x_{r2}} & \frac{\partial y}{\partial x_{r3}} & \cdots & \frac{\partial y}{\partial x_{rs}} \\ \end{matrix} \right)_{r \times s} ∂X∂y​=(∂x 1​∂y​,​∂x 2​∂y​,​⋯,​∂x s​∂y​​)=⎝⎜⎜⎜⎜⎜⎜⎛​∂x11​∂y​∂x21​∂y​∂x31​∂y​⋮∂xr1​∂y​​∂x12​∂y​∂x22​∂y​∂x32​∂y​⋮∂xr2​∂y​​∂x13​∂y​∂x23​∂y​∂x33​∂y​⋮∂xr3​∂y​​⋯⋯⋯⋱⋯​∂x1s​∂y​∂x2s​∂y​∂x3s​∂y​⋮∂xrs​∂y​​⎠⎟⎟⎟⎟⎟⎟⎞​r×s​
同样的,由全微分公式有 d y = ∑ i = 1 r ∑ j = 1 s ∂ y ∂ x i j d x i j \mathrm{d}y = \sum_{i=1}^{r} \sum_{j=1}^{s} \frac{\partial y}{\partial x_{ij}} \mathrm{d}x_{ij} dy=∑i=1r​∑j=1s​∂xij​∂y​dxij​。
( ∂ y ∂ X ) T d X = ( ∂ y ∂ x 11 ∂ y ∂ x 21 ∂ y ∂ x 31 ⋯ ∂ y ∂ x r 1 ∂ y ∂ x 12 ∂ y ∂ x 22 ∂ y ∂ x 32 ⋯ ∂ y ∂ x r 2 ∂ y ∂ x 13 ∂ y ∂ x 23 ∂ y ∂ x 33 ⋯ ∂ y ∂ x r 3 ⋮ ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x 1 s ∂ y ∂ x 2 s ∂ y ∂ x 3 s ⋯ ∂ y ∂ x r s ) s × r × ( d x 11 d x 12 d x 13 ⋯ d x 1 s d x 21 d x 22 d x 23 ⋯ d x 2 s d x 31 d x 32 d x 33 ⋯ d x 3 s ⋮ ⋮ ⋮ ⋱ ⋮ d x r 1 d x r 2 d x r 3 ⋯ d x r s ) r × s = ( ∑ i = 1 r ∂ y ∂ x i 1 d x i 1 ⋯ ⋯ ⋯ ⋯ ⋯ ∑ i = 1 r ∂ y ∂ x i 2 d x i 2 ⋯ ⋯ ⋯ ⋯ ⋯ ∑ i = 1 r ∂ y ∂ x i 3 d x i 3 ⋯ ⋯ ⋮ ⋮ ⋮ ⋱ ⋮ ⋯ ⋯ ⋯ ⋯ ∑ i = 1 r ∂ y ∂ x i s d x i s ) s × s \begin{aligned} \left( \frac{\partial y}{\partial X} \right)^T \mathrm{d}X = & \left( \begin{matrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{31}} & \cdots & \frac{\partial y}{\partial x_{r1}} \\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \frac{\partial y}{\partial x_{32}} & \cdots & \frac{\partial y}{\partial x_{r2}} \\ \frac{\partial y}{\partial x_{13}} & \frac{\partial y}{\partial x_{23}} & \frac{\partial y}{\partial x_{33}} & \cdots & \frac{\partial y}{\partial x_{r3}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y}{\partial x_{1s}} & \frac{\partial y}{\partial x_{2s}} & \frac{\partial y}{\partial x_{3s}} & \cdots & \frac{\partial y}{\partial x_{rs}} \end{matrix} \right)_{s \times r} \times \left( \begin{matrix} \mathrm{d}x_{11} & \mathrm{d}x_{12} & \mathrm{d}x_{13} & \cdots & \mathrm{d}x_{1s} \\ \mathrm{d}x_{21} & \mathrm{d}x_{22} & \mathrm{d}x_{23} & \cdots & \mathrm{d}x_{2s} \\ \mathrm{d}x_{31} & \mathrm{d}x_{32} & \mathrm{d}x_{33} & \cdots & \mathrm{d}x_{3s} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \mathrm{d}x_{r1} & \mathrm{d}x_{r2} & \mathrm{d}x_{r3} & \cdots & \mathrm{d}x_{rs} \end{matrix} \right)_{r \times s} \\ = & \left( \begin{matrix} \sum_{i=1}^{r} \frac{\partial y}{\partial x_{i1}} \mathrm{d}x_{i1} & \cdots & \cdots & \cdots & \cdots \\ \cdots & \sum_{i=1}^{r} \frac{\partial y}{\partial x_{i2}} \mathrm{d}x_{i2} & \cdots & \cdots & \cdots \\ \cdots & \cdots & \sum_{i=1}^{r} \frac{\partial y}{\partial x_{i3}} \mathrm{d}x_{i3} & \cdots & \cdots \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \cdots & \cdots & \cdots & \cdots & \sum_{i=1}^{r} \frac{\partial y}{\partial x_{is}} \mathrm{d}x_{is} \end{matrix} \right)_{s \times s} \end{aligned} (∂X∂y​)TdX==​⎝⎜⎜⎜⎜⎜⎜⎛​∂x11​∂y​∂x12​∂y​∂x13​∂y​⋮∂x1s​∂y​​∂x21​∂y​∂x22​∂y​∂x23​∂y​⋮∂x2s​∂y​​∂x31​∂y​∂x32​∂y​∂x33​∂y​⋮∂x3s​∂y​​⋯⋯⋯⋱⋯​∂xr1​∂y​∂xr2​∂y​∂xr3​∂y​⋮∂xrs​∂y​​⎠⎟⎟⎟⎟⎟⎟⎞​s×r​×⎝⎜⎜⎜⎜⎜⎛​dx11​dx21​dx31​⋮dxr1​​dx12​dx22​dx32​⋮dxr2​​dx13​dx23​dx33​⋮dxr3​​⋯⋯⋯⋱⋯​dx1s​dx2s​dx3s​⋮dxrs​​⎠⎟⎟⎟⎟⎟⎞​r×s​⎝⎜⎜⎜⎜⎜⎜⎛​∑i=1r​∂xi1​∂y​dxi1​⋯⋯⋮⋯​⋯∑i=1r​∂xi2​∂y​dxi2​⋯⋮⋯​⋯⋯∑i=1r​∂xi3​∂y​dxi3​⋮⋯​⋯⋯⋯⋱⋯​⋯⋯⋯⋮∑i=1r​∂xis​∂y​dxis​​⎠⎟⎟⎟⎟⎟⎟⎞​s×s​​
因此
t r ( ( ∂ y ∂ X ) T d X ) = ∑ j = 1 s ∑ i = 1 r ∂ y ∂ x i j d x i j = ∑ i = 1 r ∑ j = 1 s ∂ y ∂ x i j d x i j = d y \begin{aligned} tr\left( \left( \frac{\partial y}{\partial X} \right)^T \mathrm{d}X\right) = & \sum_{j=1}^{s} \sum_{i=1}^{r} \frac{\partial y}{\partial x_{ij}} \mathrm{d}x_{ij} = & \sum_{i=1}^{r} \sum_{j=1}^{s} \frac{\partial y}{\partial x_{ij}} \mathrm{d}x_{ij} = & \mathrm{d}y \end{aligned} tr((∂X∂y​)TdX)=​j=1∑s​i=1∑r​∂xij​∂y​dxij​=​i=1∑r​j=1∑s​∂xij​∂y​dxij​=​dy​

3.2向量对矩阵求导

d y ⃗ = ( d y 1 d y 2 ⋮ d y m ) = ( t r ( ( ∂ y 1 ∂ X ) T d X ) t r ( ( ∂ y 2 ∂ X ) T d X ) ⋮ t r ( ( ∂ y m ∂ X ) T d X ) ) \mathrm{d}\vec{y} = \left( \begin{matrix} \mathrm{d}y_1 \\ \mathrm{d}y_2 \\ \vdots \\ \mathrm{d}y_m \end{matrix} \right) = \left( \begin{matrix} tr\left( \left( \frac{\partial y_1}{\partial X} \right)^T \mathrm{d}X\right) \\ tr\left( \left( \frac{\partial y_2}{\partial X} \right)^T \mathrm{d}X\right) \\ \vdots \\ tr\left( \left( \frac{\partial y_m}{\partial X} \right)^T \mathrm{d}X\right) \\ \end{matrix} \right) dy ​=⎝⎜⎜⎜⎛​dy1​dy2​⋮dym​​⎠⎟⎟⎟⎞​=⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛​tr((∂X∂y1​​)TdX)tr((∂X∂y2​​)TdX)⋮tr((∂X∂ym​​)TdX)​⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞​

3.3矩阵对矩阵求导

如果采用向量对矩阵求导,我们有
d Y = ( d y ⃗ 1 , d y ⃗ 2 , ⋯ , d y ⃗ q ) = ( t r ( ( ∂ y 11 ∂ X ) T d X ) t r ( ( ∂ y 12 ∂ X ) T d X ) t r ( ( ∂ y 13 ∂ X ) T d X ) ⋯ t r ( ( ∂ y 1 q ∂ X ) T d X ) t r ( ( ∂ y 21 ∂ X ) T d X ) t r ( ( ∂ y 22 ∂ X ) T d X ) t r ( ( ∂ y 23 ∂ X ) T d X ) ⋯ t r ( ( ∂ y 2 q ∂ X ) T d X ) t r ( ( ∂ y 31 ∂ X ) T d X ) t r ( ( ∂ y 32 ∂ X ) T d X ) t r ( ( ∂ y 33 ∂ X ) T d X ) ⋯ t r ( ( ∂ y 3 q ∂ X ) T d X ) ⋮ ⋮ ⋮ ⋱ ⋮ t r ( ( ∂ y p 1 ∂ X ) T d X ) t r ( ( ∂ y p 2 ∂ X ) T d X ) t r ( ( ∂ y p 3 ∂ X ) T d X ) ⋯ t r ( ( ∂ y p q ∂ X ) T d X ) ) r × s \begin{aligned} \mathrm{d}Y = & \left( \begin{matrix} \mathrm{d}\vec{y}_1, & \mathrm{d}\vec{y}_2, & \cdots, & \mathrm{d}\vec{y}_q \end{matrix} \right) \\ = & \left( \begin{matrix} tr\left( \left( \frac{\partial y_{11}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{12}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{13}}{\partial X} \right)^T \mathrm{d}X\right) & \cdots & tr\left( \left( \frac{\partial y_{1q}}{\partial X} \right)^T \mathrm{d}X\right) \\ tr\left( \left( \frac{\partial y_{21}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{22}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{23}}{\partial X} \right)^T \mathrm{d}X\right) & \cdots & tr\left( \left( \frac{\partial y_{2q}}{\partial X} \right)^T \mathrm{d}X\right) \\ tr\left( \left( \frac{\partial y_{31}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{32}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{33}}{\partial X} \right)^T \mathrm{d}X\right) & \cdots & tr\left( \left( \frac{\partial y_{3q}}{\partial X} \right)^T \mathrm{d}X\right) \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ tr\left( \left( \frac{\partial y_{p1}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{p2}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{p3}}{\partial X} \right)^T \mathrm{d}X\right) & \cdots & tr\left( \left( \frac{\partial y_{pq}}{\partial X} \right)^T \mathrm{d}X\right) \\ \end{matrix} \right)_{r \times s} \end{aligned} dY==​(dy ​1​,​dy ​2​,​⋯,​dy ​q​​)⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛​tr((∂X∂y11​​)TdX)tr((∂X∂y21​​)TdX)tr((∂X∂y31​​)TdX)⋮tr((∂X∂yp1​​)TdX)​tr((∂X∂y12​​)TdX)tr((∂X∂y22​​)TdX)tr((∂X∂y32​​)TdX)⋮tr((∂X∂yp2​​)TdX)​tr((∂X∂y13​​)TdX)tr((∂X∂y23​​)TdX)tr((∂X∂y33​​)TdX)⋮tr((∂X∂yp3​​)TdX)​⋯⋯⋯⋱⋯​tr((∂X∂y1q​​)TdX)tr((∂X∂y2q​​)TdX)tr((∂X∂y3q​​)TdX)⋮tr((∂X∂ypq​​)TdX)​⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞​r×s​​
当然,如果采用将矩阵向量化,则有
v e c ( d Y ) = ( ∂ v e c ( Y ) ∂ v e c ( x ) ) T v e c ( d X ) vec(\mathrm{d}Y) = \left( \frac{\partial vec(Y)}{\partial vec(x)} \right)^T vec(\mathrm{d}X) vec(dY)=(∂vec(x)∂vec(Y)​)Tvec(dX)

矩阵微分算子[2]

  • D1(线性): d ( X ± Y ) = d X ± d Y \mathrm{d} \left( X \pm Y \right) = \mathrm{d}X \pm \mathrm{d}Y d(X±Y)=dX±dY。
  • D2(矩阵乘法): d ( X Y ) = ( d X ) Y + X ( d Y ) \mathrm{d} \left( X Y \right) = (\mathrm{d}X)Y + X(\mathrm{d}Y) d(XY)=(dX)Y+X(dY)。
  • D3(转置): d ( X T ) = ( d X ) T \mathrm{d}(X^T) = (\mathrm{d}X)^T d(XT)=(dX)T。
  • D4(迹): d ( t r ( X ) ) = t r ( d X ) \mathrm{d}\left( tr(X) \right) = tr(\mathrm{d}X) d(tr(X))=tr(dX)。
  • D5(逆):若 X X X可逆, d ( X − 1 ) = ( d X ) − 1 \mathrm{d}(X^{-1}) = (\mathrm{d}X)^{-1} d(X−1)=(dX)−1。
  • D6(行列式): d ∣ X ∣ = t r ( X a d j u g a t e ( d X ) ) \mathrm{d} |X| = tr\left(X^{adjugate}(\mathrm{d}X)\right) d∣X∣=tr(Xadjugate(dX)),若 X X X可逆,则 d ∣ X ∣ = ∣ X ∣ t r ( X − 1 d X ) \mathrm{d} |X| = |X|tr\left( X^{-1} \mathrm{d}X \right) d∣X∣=∣X∣tr(X−1dX)。
  • D7(逐元素乘法): d ( X ⊙ Y ) = ( d X ) ⊙ Y + X ⊙ ( d Y ) \mathrm{d}\left( X \odot Y \right) = (\mathrm{d}X) \odot Y + X \odot (\mathrm{d}Y) d(X⊙Y)=(dX)⊙Y+X⊙(dY)。
  • D7(逐元素函数): d f ( X ) = f ′ ( X ) ⊙ ( d X ) \mathrm{d} f(X) = f^{'}(X) \odot ( \mathrm{d} X ) df(X)=f′(X)⊙(dX)。

4.行列式对矩阵求导

行列式对矩阵求导,同样也属于标量对矩阵求导类型。

  • DM1:对于 ∀ X ∈ F n × n , A ∈ F p × n , B ∈ F n × q \forall X \in \mathbb{F}^{n \times n}, A \in \mathbb{F}^{p \times n}, B \in \mathbb{F}^{n \times q} ∀X∈Fn×n,A∈Fp×n,B∈Fn×q, 有 ∂ ∣ A X B ∣ ∂ X = ∣ A X B ∣ ( X − 1 ) T \frac{\partial |AXB|}{\partial X} = |AXB|(X^{-1})^T ∂X∂∣AXB∣​=∣AXB∣(X−1)T。
  • DM2:对于 ∀ X ∈ F n × n \forall X \in \mathbb{F}^{n \times n} ∀X∈Fn×n, 有 ∂ l n ( ∣ X ∣ ) ∂ X = ( X − 1 ) T \frac{\partial ln(|X|)}{\partial X} = (X^{-1})^T ∂X∂ln(∣X∣)​=(X−1)T。
  • DM3:对于 ∀ X ( z ) ∈ F n × n , z ∈ F \forall X(z) \in \mathbb{F}^{n \times n}, z \in \mathbb{F} ∀X(z)∈Fn×n,z∈F, 有 ∂ l n ( ∣ X ( z ) ∣ ) ∂ z = t r ( X − 1 ∂ X ∂ z ) \frac{\partial ln(|X(z)|)}{\partial z} = tr\left( X^{-1} \frac{\partial X}{\partial z} \right) ∂z∂ln(∣X(z)∣)​=tr(X−1∂z∂X​)。
  • DM4:对于 ∀ X ∈ F n × m , A ∈ F n × n \forall X \in \mathbb{F}^{n \times m}, A \in \mathbb{F}^{n \times n} ∀X∈Fn×m,A∈Fn×n, 有 ∂ ∣ X T A X ∣ ∂ X = ∣ X T A X ∣ ( A X ( X T A X ) − 1 + A T X ( X T A T X ) − 1 ) \frac{\partial |X^T A X|}{\partial X} = |X^T A X| \left( AX \left(X^TAX\right)^{-1} + A^TX \left(X^TA^TX\right)^{-1} \right) ∂X∂∣XTAX∣​=∣XTAX∣(AX(XTAX)−1+ATX(XTATX)−1)。

5.迹对矩阵求导

迹对矩阵求导,本质上属于标量对矩阵求导类型。
迹的性质

  • TR1(标量):对于 ∀ a ∈ F 1 \forall a \in \mathbb{F}^1 ∀a∈F1, 都有 a = t r ( a ) a = tr(a) a=tr(a)。
  • TR2(转置):对于 ∀ A ∈ F m × n \forall A \in \mathbb{F}^{m \times n} ∀A∈Fm×n, 都有 t r ( A ) = t r ( A T ) tr(A) = tr(A^T) tr(A)=tr(AT)。
  • TR3(线性):对于 ∀ A , B ∈ F m × n \forall A,B \in \mathbb{F}^{m \times n} ∀A,B∈Fm×n, 都有 t r ( A ± B ) = t r ( A ) ± t r ( B ) tr(A \pm B) = tr(A) \pm tr(B) tr(A±B)=tr(A)±tr(B)。
  • TR4(对矩阵乘法交换律):对于 ∀ A ∈ F m × n , B ∈ F n × m \forall A \in \mathbb{F}^{m \times n}, B \in \mathbb{F}^{n \times m} ∀A∈Fm×n,B∈Fn×m, 都有 t r ( A B ) = t r ( B A ) tr(AB) = tr(BA) tr(AB)=tr(BA)。
  • TR5(对矩阵乘法/逐元素乘法交换律):对于 ∀ A , B , C ∈ F m × n \forall A,B,C \in \mathbb{F}^{m \times n} ∀A,B,C∈Fm×n, 有 t r ( A T ( B ⊙ C ) ) = t r ( ( A ⊙ B ) T C ) tr\left( A^T(B \odot C) \right) = tr \left( (A \odot B)^T C \right) tr(AT(B⊙C))=tr((A⊙B)TC)。

常用的函数[2]

f f f d f \mathrm{d} f df ∂ f ∂ X \frac{\partial f}{\partial X} ∂X∂f​
t r ( X ) tr(X) tr(X) t r ( I d X ) tr(I \mathrm{d}X) tr(IdX) I I I
t r ( X T ) tr(X^T) tr(XT) 2 t r ( X T d X ) 2tr(X^T \mathrm{d}X) 2tr(XTdX) 2 X 2X 2X
t r ( X 2 ) tr(X^2) tr(X2) 2 t r ( X d X ) 2tr(X \mathrm{d}X) 2tr(XdX) 2 X T 2 X^T 2XT
t r ( A X ) tr(A X) tr(AX) t r ( A d X ) tr(A \mathrm{d}X) tr(AdX) A T A^T AT
t r ( X T A X ) tr(X^T A X) tr(XTAX) t r ( X T ( A + A T ) d X ) tr(X^T(A + A^T) \mathrm{d}X) tr(XT(A+AT)dX) ( A + A T ) X (A + A^T)X (A+AT)X
t r ( X A X T ) tr(X A X^T) tr(XAXT) t r ( ( A + A T ) X T d X ) tr((A + A^T)X^T \mathrm{d}X) tr((A+AT)XTdX) X ( A + A T ) X(A + A^T) X(A+AT)
t r ( X A X ) tr(X A X) tr(XAX) t r ( ( A X + X A ) d X ) tr((AX + XA) \mathrm{d}X) tr((AX+XA)dX) X T A T + A T X T X^T A^T + A^T X^T XTAT+ATXT
t r ( A X − 1 ) tr(A X^{-1}) tr(AX−1) − t r ( X − 1 A X − 1 d X ) -tr(X^{-1}AX^{-1} \mathrm{d}X) −tr(X−1AX−1dX) − ( X − 1 A X − 1 ) T -\left( X^{-1} A X^{-1} \right)^T −(X−1AX−1)T
t r ( X A X B ) tr(X A X B) tr(XAXB) t r ( ( A X B + B X A ) d X ) tr((AXB + BXA) \mathrm{d}X) tr((AXB+BXA)dX) ( A X B + B X A ) T \left( A X B + B X A \right)^T (AXB+BXA)T
t r ( X A X T B ) tr(X A X^T B) tr(XAXTB) t r ( ( A X T B + A T X T B T ) d X ) tr((A X^T B + A^T X^T B^T) \mathrm{d}X) tr((AXTB+ATXTBT)dX) B T X A T + B X A B^T X A^T + B X A BTXAT+BXA
  • TM1:对于 ∀ X ∈ F n × n \forall X \in \mathbb{F}^{n \times n} ∀X∈Fn×n,
    有 ∂ t r ( X ) ∂ X = I \frac{\partial tr(X)}{\partial X} = I ∂X∂tr(X)​=I。
  • TM2:对于 ∀ X ∈ F n × m , A ∈ F m × n \forall X \in \mathbb{F}^{n \times m}, A \in \mathbb{F}^{m \times n} ∀X∈Fn×m,A∈Fm×n,有 ∂ t r ( X A ) ∂ X = ∂ t r ( A X ) ∂ X = A T \frac{\partial tr(XA)}{\partial X} = \frac{\partial tr(AX)}{\partial X} = A^T ∂X∂tr(XA)​=∂X∂tr(AX)​=AT。
  • TM3:对于 ∀ X ∈ F n × m , A ∈ F n × n \forall X \in \mathbb{F}^{n \times m}, A \in \mathbb{F}^{n \times n} ∀X∈Fn×m,A∈Fn×n, 有 ∂ t r ( X T A X ) ∂ X = ( A + A T ) X \frac{\partial tr(X^T A X)}{\partial X} = (A+A^T)X ∂X∂tr(XTAX)​=(A+AT)X。
  • TM4:对于 ∀ X , A ∈ F n × n \forall X, A \in \mathbb{F}^{n \times n} ∀X,A∈Fn×n, 有 ∂ t r ( X − 1 A ) ∂ X = − X − 1 A T X − 1 \frac{\partial tr(X^{-1} A)}{\partial X} = - X^{-1} A^T X^{-1} ∂X∂tr(X−1A)​=−X−1ATX−1。

这里为了描述方便,用 x i X , j X x_{i_X,j_X} xiX​,jX​​表示矩阵 X X X的第 i i i、 j j j列元素。

TM1
∂ t r ( X ) ∂ X = ( ∂ t r ( X ) ∂ x i X , j X ) n × n = ( ∂ ∂ x i X , j X ( ∑ i = 1 n x i , i ) ) n × n = ( ∂ x i X , i X ∂ x i X , j X ) n × n = I . \begin{aligned} \frac{\partial tr(X)}{\partial X} = & \left( \frac{\partial tr(X)}{\partial x_{i_X,j_X}} \right)_{n \times n} \\ = & \left( \frac{\partial }{\partial x_{i_X,j_X}} \left( \sum_{i=1}^{n}x_{i,i} \right) \right)_{n \times n} \\ = & \left( \frac{\partial x_{i_X,i_X}}{\partial x_{i_X,j_X}}\right)_{n \times n} \\ = & I. \end{aligned} ∂X∂tr(X)​====​(∂xiX​,jX​​∂tr(X)​)n×n​(∂xiX​,jX​​∂​(i=1∑n​xi,i​))n×n​(∂xiX​,jX​​∂xiX​,iX​​​)n×n​I.​

TM2
∂ t r ( X A ) ∂ X = ( ∂ t r ( X A ) ∂ x i X , j X ) n × m = ( ∂ ∂ x i X , j X ( ∑ i = 1 n ∑ k = 1 m x i , k a k , i ) ) n × m = ( ∂ x i X , j X a j X , i X ∂ x i X , j X ) n × m = ( a j X , i X ) n × m = A T . \begin{aligned} \frac{\partial tr(XA)}{\partial X} = & \left( \frac{\partial tr(XA)}{\partial x_{i_X,j_X}} \right)_{n \times m} \\ = & \left( \frac{\partial }{\partial x_{i_X,j_X}} \left( \sum_{i=1}^{n} \sum_{k=1}^{m} x_{i,k}a_{k,i} \right) \right)_{n \times m} \\ = & \left( \frac{\partial x_{i_X,j_X}a_{j_X,i_X}}{\partial x_{i_X,j_X}}\right)_{n \times m} \\ = & \left( a_{j_X,i_X} \right)_{n \times m} \\ = & A^T. \end{aligned} ∂X∂tr(XA)​=====​(∂xiX​,jX​​∂tr(XA)​)n×m​(∂xiX​,jX​​∂​(i=1∑n​k=1∑m​xi,k​ak,i​))n×m​(∂xiX​,jX​​∂xiX​,jX​​ajX​,iX​​​)n×m​(ajX​,iX​​)n×m​AT.​

TM3,记
X = ( x i X 1 , j X 1 ) n × m , X T = ( x j X 1 , i X 1 ) m × n , A = ( a j A , i A ) n × n , X=\left( x_{i_{X1},j_{X1}} \right)_{n \times m}, X^T=\left( x_{j_{X1},i_{X1}} \right)_{m \times n}, A=\left( a_{j_{A},i_{A}} \right)_{n \times n}, X=(xiX1​,jX1​​)n×m​,XT=(xjX1​,iX1​​)m×n​,A=(ajA​,iA​​)n×n​,
X T A = ( ∑ i = 1 n x j X 1 , i a i , j A ) m × n , X T A X = ( ∑ j = 1 n ∑ i = 1 n x j X 1 , i a i , j x j , j X 2 ) m × m . X^T A = \left( \sum_{i=1}^{n} x_{j_{X1},i} a_{i,j_{A}} \right)_{m \times n}, X^T A X = \left( \sum_{j=1}^{n} \sum_{i=1}^{n} x_{j_{X1},i} a_{i,j}x_{j,j_{X2}} \right)_{m \times m}. XTA=(i=1∑n​xjX1​,i​ai,jA​​)m×n​,XTAX=(j=1∑n​i=1∑n​xjX1​,i​ai,j​xj,jX2​​)m×m​.
A X = ( ∑ i = 1 n a i A , i x i , j X ) n × m , A T X = ( ∑ i = 1 n a i , j A x i , j X ) n × m , A X = \left( \sum_{i=1}^{n} a_{i_A,i}x_{i,j_{X}} \right)_{n \times m}, A^T X = \left( \sum_{i=1}^{n} a_{i,j_A}x_{i,j_{X}} \right)_{n \times m}, AX=(i=1∑n​aiA​,i​xi,jX​​)n×m​,ATX=(i=1∑n​ai,jA​​xi,jX​​)n×m​,
因此 t r ( X T A X ) = ∑ k = 1 m ∑ j = 1 n ∑ i = 1 n x k , i a i , j x j , k . tr(X^T A X) = \sum_{k=1}^{m} \sum_{j=1}^{n} \sum_{i=1}^{n} x_{k,i} a_{i,j}x_{j,k} . tr(XTAX)=k=1∑m​j=1∑n​i=1∑n​xk,i​ai,j​xj,k​.
∂ t r ( X T A X ) ∂ X = ( ∂ t r ( X T A X ) ∂ x i X , j X ) n × m = ( ∂ ∂ x i X , j X ( ∑ k = 1 m ∑ j = 1 n ∑ i = 1 n x k , i a i , j x j , k ) ) n × m = ( ∂ ∂ x i X , j X ( ∑ j = 1 n x i X , j X a j X , j x j , i X ) + ∂ ∂ x i X , j X ( ∑ i = 1 n x j X , i a i , i X x i X , j X ) ) n × m = ( ( ∑ j = 1 n a j X , j x j , i X ) + ( ∑ i = 1 n x j X , i a i , i X ) ) n × m = A X + A T X = ( A + A T ) X . \begin{aligned} \frac{\partial tr(X^T A X)}{\partial X} = & \left( \frac{\partial tr(X^T A X)}{\partial x_{i_X, j_X}} \right)_{n \times m} \\ = & \left( \frac{\partial }{\partial x_{i_X, j_X}} \left( \sum_{k=1}^{m} \sum_{j=1}^{n} \sum_{i=1}^{n} x_{k,i} a_{i,j}x_{j,k} \right) \right)_{n \times m} \\ = & \left( \frac{\partial }{\partial x_{i_X, j_X}} \left( \sum_{j=1}^{n} x_{i_X,j_X} a_{j_X,j}x_{j,i_{X}} \right) + \frac{\partial }{\partial x_{i_X, j_X}} \left( \sum_{i=1}^{n} x_{j_X,i} a_{i,i_X}x_{i_X,j_X} \right) \right)_{n \times m} \\ = & \left( \left( \sum_{j=1}^{n} a_{j_X,j}x_{j,i_{X}} \right) + \left( \sum_{i=1}^{n} x_{j_X,i} a_{i,i_X} \right) \right)_{n \times m} \\ = & A X + A^T X \\ = & (A + A^T) X. \end{aligned} ∂X∂tr(XTAX)​======​(∂xiX​,jX​​∂tr(XTAX)​)n×m​(∂xiX​,jX​​∂​(k=1∑m​j=1∑n​i=1∑n​xk,i​ai,j​xj,k​))n×m​(∂xiX​,jX​​∂​(j=1∑n​xiX​,jX​​ajX​,j​xj,iX​​)+∂xiX​,jX​​∂​(i=1∑n​xjX​,i​ai,iX​​xiX​,jX​​))n×m​((j=1∑n​ajX​,j​xj,iX​​)+(i=1∑n​xjX​,i​ai,iX​​))n×m​AX+ATX(A+AT)X.​

6.例题

这里的例题均摘录自[3]。

【例1】, f = a ⃗ T X b ⃗ , a ⃗ ∈ F m × 1 , X ∈ F m × n , b ⃗ ∈ F n × 1 f = \vec{a}^T X \vec{b}, \vec{a} \in \mathbb{F}^{m \times 1}, X \in \mathbb{F}^{m \times n}, \vec{b} \in \mathbb{F}^{n \times 1} f=a TXb ,a ∈Fm×1,X∈Fm×n,b ∈Fn×1, 求 ∂ f ∂ X \frac{\partial f}{\partial X} ∂X∂f​。

【解】 ∂ f ∂ X = ∂ t r ( a ⃗ T X b ⃗ ) ∂ X = T R 4 ∂ t r ( b ⃗ a ⃗ T X ) ∂ X = T M 2 a ⃗ b ⃗ T \frac{\partial f}{\partial X} = \frac{\partial tr\left( \vec{a}^T X \vec{b} \right)}{\partial X} \overset{TR4}{=} \frac{\partial tr\left( \vec{b} \vec{a}^T X \right)}{\partial X} \overset{TM2}{=} \vec{a} \vec{b}^T ∂X∂f​=∂X∂tr(a TXb )​=TR4∂X∂tr(b a TX)​=TM2a b T。

【例2】 f = a ⃗ T e x p ( X b ⃗ ) , a ⃗ ∈ F m × 1 , X ∈ F m × n , b ⃗ ∈ F n × 1 f = \vec{a}^T exp(X \vec{b}), \vec{a} \in \mathbb{F}^{m \times 1}, X \in \mathbb{F}^{m \times n}, \vec{b} \in \mathbb{F}^{n \times 1} f=a Texp(Xb ),a ∈Fm×1,X∈Fm×n,b ∈Fn×1,求 ∂ f ∂ X \frac{\partial f}{\partial X} ∂X∂f​。

【解】 先采用微分算子操作 d f = a ⃗ T ( e x p ( X b ⃗ ) ⊙ ( d X b ⃗ ) ) \mathrm{d}f = \vec{a}^T \left( exp(X \vec{b}) \odot (\mathrm{d}X \vec{b}) \right) df=a T(exp(Xb )⊙(dXb ))。

两边取迹,然后凑成TM2形式。
d f = t r ( a ⃗ T ( e x p ( X b ⃗ ) ⊙ ( d X b ⃗ ) ) ) = T R 5 t r ( ( a ⃗ ⊙ e x p ( X b ⃗ ) ) T ( d X b ⃗ ) ) = T R 4 t r ( b ⃗ ( a ⃗ ⊙ e x p ( X b ⃗ ) ) T d X ) \begin{aligned} \mathrm{d}f = & tr \left( \vec{a}^T \left( exp(X \vec{b}) \odot (\mathrm{d}X \vec{b}) \right) \right) \\ \overset{TR5}{=} & tr \left( \left( \vec{a} \odot exp(X \vec{b}) \right)^T (\mathrm{d}X \vec{b}) \right) \\ \overset{TR4}{=} & tr \left( \vec{b} \left( \vec{a} \odot exp(X \vec{b}) \right)^T \mathrm{d}X \right) \end{aligned} df==TR5=TR4​tr(a T(exp(Xb )⊙(dXb )))tr((a ⊙exp(Xb ))T(dXb ))tr(b (a ⊙exp(Xb ))TdX)​
得到 ∂ f ∂ X = ( b ⃗ ( a ⃗ ⊙ e x p ( X b ⃗ ) ) T ) T = ( a ⃗ ⊙ e x p ( X b ⃗ ) ) b ⃗ T \frac{\partial f}{\partial X} = \left( \vec{b} \left( \vec{a} \odot exp(X \vec{b}) \right)^T \right)^T = \left( \vec{a} \odot exp(X \vec{b}) \right) \vec{b}^T ∂X∂f​=(b (a ⊙exp(Xb ))T)T=(a ⊙exp(Xb ))b T。

【例3】 f = t r ( Y T M Y ) , Y = σ ( W X ) f = tr\left( Y^T M Y \right), Y = \sigma \left( WX \right) f=tr(YTMY),Y=σ(WX),求 ∂ f ∂ X \frac{\partial f}{\partial X} ∂X∂f​。其中 W ∈ F l × m , X ∈ F m × n , Y ∈ F l × n , M ∈ F l × l W \in \mathrm{F}^{l \times m}, X \in \mathrm{F}^{m \times n}, Y \in \mathrm{F}^{l \times n}, M \in \mathrm{F}^{l \times l} W∈Fl×m,X∈Fm×n,Y∈Fl×n,M∈Fl×l, σ \sigma σ是逐元素函数, f f f是标量。

【解】 先求 ∂ f ∂ Y \frac{\partial f}{\partial Y} ∂Y∂f​部分,
∂ f ∂ Y = ( M + M T ) Y . \frac{\partial f}{\partial Y} = \left( M + M^T \right)Y. ∂Y∂f​=(M+MT)Y.
得到 d f \mathrm{d}f df与 d Y \mathrm{d}Y dY的关系 d f = t r ( ∂ f ∂ Y T d Y ) = t r ( Y T ( M + M T ) d Y ) \mathrm{d}f = tr\left( \frac{\partial f}{\partial Y}^T \mathrm{d}Y \right) = tr\left( Y^T \left( M + M^T \right) \mathrm{d}Y \right) df=tr(∂Y∂f​TdY)=tr(YT(M+MT)dY)。

再求 d Y \mathrm{d}Y dY,
d Y = D 7 σ ′ ( W X ) ⊙ d ( W X ) = σ ′ ( W X ) ⊙ ( W d X ) . \begin{aligned} \mathrm{d}Y \overset{D7}{=} & \sigma^{'}(W X) \odot \mathrm{d}(WX) \\ = & \sigma^{'}(W X) \odot \left( W \mathrm{d}X \right). \end{aligned} dY=D7=​σ′(WX)⊙d(WX)σ′(WX)⊙(WdX).​
合并得到
d f = t r ( Y T ( M + M T ) σ ′ ( W X ) ⊙ ( W d X ) ) = T R 5 t r ( ( ( M + M T ) Y ⊙ σ ′ ( W X ) ) T W d X ) . \begin{aligned} \mathrm{d}f & = tr \left( Y^T \left( M + M^T \right) \sigma^{'}(W X) \odot \left( W \mathrm{d}X \right) \right) \\ \overset{TR5}{=} & tr \left( \left( (M + M^T)Y \odot \sigma^{'}(W X) \right)^T W \mathrm{d}X \right). \end{aligned} df=TR5​=tr(YT(M+MT)σ′(WX)⊙(WdX))tr(((M+MT)Y⊙σ′(WX))TWdX).​
得 ∂ f ∂ X = W T ( ( M + M T ) Y ⊙ σ ′ ( W X ) ) \frac{\partial f}{\partial X}= W^T \left( (M + M^T)Y \odot \sigma^{'}(W X) \right) ∂X∂f​=WT((M+MT)Y⊙σ′(WX))。

【例4】 l = ∥ X w ⃗ − y ⃗ ∥ 2 , y ⃗ ∈ F m × 1 , X ∈ F m × n , w ⃗ ∈ F m × 1 l = \| X \vec{w} - \vec{y} \|^2, \vec{y} \in \mathbb{F}^{m \times 1}, X \in \mathbb{F}^{m \times n}, \vec{w} \in \mathbb{F}^{m \times 1} l=∥Xw −y ​∥2,y ​∈Fm×1,X∈Fm×n,w ∈Fm×1,求 w ⃗ \vec{w} w 的最小二乘估计。

【解】
l = ∥ X w ⃗ − y ⃗ ∥ 2 = ( X w ⃗ − y ⃗ ) T ( X w ⃗ − y ⃗ ) = ( w ⃗ T X T − y ⃗ T ) ( X w ⃗ − y ⃗ ) = w ⃗ T X T X w ⃗ − w ⃗ T X T y ⃗ − y ⃗ T X w ⃗ + y ⃗ T y ⃗ . \begin{aligned} l = & \| X \vec{w} - \vec{y} \|^2 \\ = & \left( X \vec{w} - \vec{y} \right)^T \left( X \vec{w} - \vec{y} \right) \\ = & \left( \vec{w}^T X^T - \vec{y}^T \right) \left( X \vec{w} - \vec{y} \right) \\ = & \vec{w}^T X^T X \vec{w} - \vec{w}^T X^T \vec{y} - \vec{y}^T X \vec{w} + \vec{y}^T \vec{y}. \end{aligned} l====​∥Xw −y ​∥2(Xw −y ​)T(Xw −y ​)(w TXT−y ​T)(Xw −y ​)w TXTXw −w TXTy ​−y ​TXw +y ​Ty ​.​

∂ l ∂ w ⃗ = ∂ t r ( l ) ∂ w ⃗ = ∂ ∂ w ⃗ t r ( w ⃗ T X T X w ⃗ − w ⃗ T X T y ⃗ − y ⃗ T X w ⃗ + y ⃗ T y ⃗ ) = ∂ ∂ w ⃗ t r ( w ⃗ T X T X w ⃗ ) − ∂ ∂ w ⃗ t r ( 2 y ⃗ T X w ⃗ ) = 2 ( X T X ) w ⃗ − 2 X T y ⃗ = 0. \begin{aligned} \frac{\partial l}{\partial \vec{w}} = & \frac{\partial tr(l)}{\partial \vec{w}} \\ = & \frac{\partial }{\partial \vec{w}} tr \left( \vec{w}^T X^T X \vec{w} - \vec{w}^T X^T \vec{y} - \vec{y}^T X \vec{w} + \vec{y}^T \vec{y} \right) \\ = & \frac{\partial }{\partial \vec{w}} tr \left( \vec{w}^T X^T X \vec{w} \right) - \frac{\partial }{\partial \vec{w}} tr \left( 2\vec{y}^T X \vec{w} \right) \\ = & 2(X^T X) \vec{w} - 2X^T \vec{y} \\ = & 0. \end{aligned} ∂w ∂l​=====​∂w ∂tr(l)​∂w ∂​tr(w TXTXw −w TXTy ​−y ​TXw +y ​Ty ​)∂w ∂​tr(w TXTXw )−∂w ∂​tr(2y ​TXw )2(XTX)w −2XTy ​0.​
得 w ⃗ = ( X T X ) − 1 X T y ⃗ \vec{w} = (X^T X)^{-1} X^T \vec{y} w =(XTX)−1XTy ​。

【例5】 样本 x ⃗ 1 , ⋯ , x ⃗ N ∼ N ( μ ⃗ , Σ ) \vec{x}_1,\cdots,\vec{x}_N \thicksim \mathcal{N}\left( \vec{\mu}, \Sigma \right) x 1​,⋯,x N​∼N(μ ​,Σ),
求方差 Σ \Sigma Σ的极大似然估计。

【解】 对数似然函数为 l = l n ∣ Σ ∣ + 1 N ∑ i = 1 N ( x ⃗ i − x ⃗ ˉ ) T Σ − 1 ( x ⃗ i − x ⃗ ˉ ) . l = ln|\Sigma| + \frac{1}{N}\sum_{i=1}^{N} \left( \vec{x}_i - \bar{\vec{x}} \right)^T\Sigma^{-1} \left( \vec{x}_i - \bar{\vec{x}} \right). l=ln∣Σ∣+N1​∑i=1N​(x i​−x ˉ)TΣ−1(x i​−x ˉ).

因此
∂ l ∂ Σ = ∂ ∂ Σ ( l n ∣ Σ ∣ + 1 N ∑ i = 1 N ( x ⃗ i − x ⃗ ˉ ) T Σ − 1 ( x ⃗ i − x ⃗ ˉ ) ) = D M 2 ( Σ − 1 ) T + ∂ ∂ Σ t r ( 1 N ∑ i = 1 N ( x ⃗ i − x ⃗ ˉ ) T Σ − 1 ( x ⃗ i − x ⃗ ˉ ) ) = ( Σ − 1 ) T + 1 N ∑ i = 1 N ∂ ∂ Σ t r ( ( x ⃗ i − x ⃗ ˉ ) ( Σ − 1 ) T ( x ⃗ i − x ⃗ ˉ ) T ) = ( Σ − 1 ) T + 1 N ∑ i = 1 N ∂ ∂ Σ t r ( ( Σ − 1 ) T ( x ⃗ i − x ⃗ ˉ ) T ( x ⃗ i − x ⃗ ˉ ) ) = ( Σ − 1 ) T − 1 N ∑ i = 1 N ( ( Σ − 1 ) T ( x ⃗ i − x ⃗ ˉ ) ( x ⃗ i − x ⃗ ˉ ) T ( Σ − 1 ) T ) = ( Σ − 1 ) T − ( Σ − 1 ) T ( 1 N ∑ i = 1 N ( x ⃗ i − x ⃗ ˉ ) ( x ⃗ i − x ⃗ ˉ ) T ) ( Σ − 1 ) T = ( Σ − 1 ) T − ( Σ − 1 ) T S 2 ( Σ − 1 ) T = ( Σ − 1 − Σ − 1 S 2 Σ − 1 ) T = 0. \begin{aligned} \frac{\partial l}{\partial \Sigma} = & \frac{\partial}{\partial \Sigma} \left( ln|\Sigma| + \frac{1}{N}\sum_{i=1}^{N} \left( \vec{x}_i - \bar{\vec{x}} \right)^T \Sigma^{-1} \left( \vec{x}_i - \bar{\vec{x}} \right) \right) \\ \overset{DM2}{=} & \left( \Sigma^{-1} \right)^T + \frac{\partial}{\partial \Sigma} tr \left( \frac{1}{N}\sum_{i=1}^{N} \left( \vec{x}_i - \bar{\vec{x}} \right)^T \Sigma^{-1} \left( \vec{x}_i - \bar{\vec{x}} \right) \right) \\ = & \left( \Sigma^{-1} \right)^T + \frac{1}{N}\sum_{i=1}^{N} \frac{\partial}{\partial \Sigma} tr \left( \left( \vec{x}_i - \bar{\vec{x}} \right) \left( \Sigma^{-1} \right)^T \left( \vec{x}_i - \bar{\vec{x}} \right)^T \right) \\ = & \left( \Sigma^{-1} \right)^T + \frac{1}{N}\sum_{i=1}^{N} \frac{\partial}{\partial \Sigma} tr \left( \left( \Sigma^{-1} \right)^T \left( \vec{x}_i - \bar{\vec{x}} \right)^T \left( \vec{x}_i - \bar{\vec{x}} \right) \right) \\ = & \left( \Sigma^{-1} \right)^T - \frac{1}{N}\sum_{i=1}^{N} \left( \left( \Sigma^{-1} \right)^T \left( \vec{x}_i - \bar{\vec{x}} \right) \left( \vec{x}_i - \bar{\vec{x}} \right)^T \left( \Sigma^{-1} \right)^T \right) \\ = & \left( \Sigma^{-1} \right)^T - \left( \Sigma^{-1} \right)^T \left( \frac{1}{N}\sum_{i=1}^{N} \left( \vec{x}_i - \bar{\vec{x}} \right) \left( \vec{x}_i - \bar{\vec{x}} \right)^T \right) \left( \Sigma^{-1} \right)^T \\ = & \left( \Sigma^{-1} \right)^T - \left( \Sigma^{-1} \right)^T S^2 \left( \Sigma^{-1} \right)^T \\ = & \left( \Sigma^{-1} - \Sigma^{-1} S^2 \Sigma^{-1} \right)^T \\ = & 0. \end{aligned} ∂Σ∂l​==DM2=======​∂Σ∂​(ln∣Σ∣+N1​i=1∑N​(x i​−x ˉ)TΣ−1(x i​−x ˉ))(Σ−1)T+∂Σ∂​tr(N1​i=1∑N​(x i​−x ˉ)TΣ−1(x i​−x ˉ))(Σ−1)T+N1​i=1∑N​∂Σ∂​tr((x i​−x ˉ)(Σ−1)T(x i​−x ˉ)T)(Σ−1)T+N1​i=1∑N​∂Σ∂​tr((Σ−1)T(x i​−x ˉ)T(x i​−x ˉ))(Σ−1)T−N1​i=1∑N​((Σ−1)T(x i​−x ˉ)(x i​−x ˉ)T(Σ−1)T)(Σ−1)T−(Σ−1)T(N1​i=1∑N​(x i​−x ˉ)(x i​−x ˉ)T)(Σ−1)T(Σ−1)T−(Σ−1)TS2(Σ−1)T(Σ−1−Σ−1S2Σ−1)T0.​
得到方差估计 Σ = S 2 \Sigma = S^2 Σ=S2。

【例6】 l = − y ⃗ T l o g s o f t m a x ( W x ⃗ ) , y ⃗ ∈ F m × 1 , W ∈ F m × n , x ⃗ ∈ F n × 1 l = - \vec{y}^T log softmax(W \vec{x}), \vec{y} \in \mathbb{F}^{m \times 1}, W \in \mathbb{F}^{m \times n}, \vec{x} \in \mathbb{F}^{n \times 1} l=−y ​Tlogsoftmax(Wx ),y ​∈Fm×1,W∈Fm×n,x ∈Fn×1。求 ∂ l ∂ W \frac{\partial l}{\partial W} ∂W∂l​。其中 y ⃗ \vec{y} y ​只有一个元素为 1 1 1,其他都是 0 0 0。

【解】 首先,对于 u ⃗ ∈ F n × 1 , c ∈ F 1 \vec{u} \in \mathbb{F}^{n \times 1}, c \in \mathbb{F}^{1} u ∈Fn×1,c∈F1,
有 l o g ( u ⃗ c ) = l o g ( u ⃗ ) − 1 ⃗ l o g ( c ) log(\frac{\vec{u}}{c}) = log(\vec{u}) - \vec{1}log(c) log(cu ​)=log(u )−1 log(c)。
因此
l = − y ⃗ T l o g s o f t m a x ( W x ⃗ ) = − y ⃗ T l o g ( e x p ( W x ⃗ ) 1 ⃗ T e x p ( W x ⃗ ) ) = − y ⃗ T ( W x ⃗ − 1 ⃗ l o g ( 1 ⃗ T e x p ( W x ⃗ ) ) ) = − y ⃗ T W x ⃗ + l o g ( 1 ⃗ T e x p ( W x ⃗ ) ) . \begin{aligned} l = & - \vec{y}^T log softmax(W \vec{x}) \\ = & - \vec{y}^T log \left( \frac{exp(W \vec{x})}{\vec{1}^T exp(W \vec{x})} \right) \\ = & - \vec{y}^T \left( W \vec{x} - \vec{1} log \left( \vec{1}^T exp(W \vec{x} ) \right)\right) \\ = & - \vec{y}^T W \vec{x} + log \left( \vec{1}^T exp(W \vec{x} ) \right). \end{aligned} l====​−y ​Tlogsoftmax(Wx )−y ​Tlog(1 Texp(Wx )exp(Wx )​)−y ​T(Wx −1 log(1 Texp(Wx )))−y ​TWx +log(1 Texp(Wx )).​
第一部分 ∂ ∂ W ( − y ⃗ T W x ⃗ ) = ∂ ∂ W t r ( − x ⃗ y ⃗ T W ) = − y ⃗ x ⃗ T . \frac{\partial }{\partial W} \left( - \vec{y}^T W \vec{x} \right) = \frac{\partial }{\partial W} tr \left( - \vec{x} \vec{y}^T W \right) = - \vec{y} \vec{x}^T. ∂W∂​(−y ​TWx )=∂W∂​tr(−x y ​TW)=−y ​x T.
第二部分
d ( l o g ( 1 ⃗ T e x p ( W x ⃗ ) ) ) = d t r ( l o g ( 1 ⃗ T e x p ( W x ⃗ ) ) ) = d t r ( 1 ⃗ T ( e x p ( W x ⃗ ) ⊙ ( d W x ⃗ ) ) 1 ⃗ T e x p ( W x ⃗ ) ) = d t r ( ( 1 ⃗ ⊙ e x p ( W x ⃗ ) T ) ( d W x ⃗ ) 1 ⃗ T e x p ( W x ⃗ ) ) = d t r ( x ⃗ e x p ( W x ⃗ ) T ( d W ) 1 ⃗ T e x p ( W x ⃗ ) ) \begin{aligned} \mathrm{d} \left( log \left( \vec{1}^T exp(W \vec{x} ) \right) \right) = & \mathrm{d} tr \left( log \left( \vec{1}^T exp(W \vec{x} ) \right) \right) \\ = & \mathrm{d} tr \left( \frac{\vec{1}^T \left( exp(W\vec{x}) \odot \left(\mathrm{d}W \vec{x}\right) \right) }{ \vec{1}^T exp(W\vec{x}) } \right) \\ = & \mathrm{d} tr \left( \frac{ \left( \vec{1} \odot exp(W\vec{x})^T \right) \left( \mathrm{d}W \vec{x} \right) }{ \vec{1}^T exp(W\vec{x}) } \right) \\ = & \mathrm{d} tr \left( \frac{ \vec{x} exp(W\vec{x})^T \left( \mathrm{d}W \right) }{ \vec{1}^T exp(W\vec{x}) } \right) \\ \end{aligned} d(log(1 Texp(Wx )))====​dtr(log(1 Texp(Wx )))dtr(1 Texp(Wx )1 T(exp(Wx )⊙(dWx ))​)dtr⎝⎛​1 Texp(Wx )(1 ⊙exp(Wx )T)(dWx )​⎠⎞​dtr(1 Texp(Wx )x exp(Wx )T(dW)​)​
故得
∂ l ∂ W = − y ⃗ x ⃗ T + s o f t m a x ( W x ⃗ ) x ⃗ T = ( s o f t m a x ( W x ⃗ ) − y ⃗ ) x ⃗ T . \frac{\partial l}{\partial W} = - \vec{y} \vec{x}^T + softmax(W \vec{x})\vec{x}^T = \left( softmax(W \vec{x}) - \vec{y} \right) \vec{x}^T. ∂W∂l​=−y ​x T+softmax(Wx )x T=(softmax(Wx )−y ​)x T.

【例7】 有样本 ( x ⃗ 1 , y ⃗ 1 ) , ( x ⃗ 2 , y ⃗ 2 ) , ⋯ , ( x ⃗ N , y ⃗ N ) (\vec{x}_1, \vec{y}_1), (\vec{x}_2, \vec{y}_2), \cdots, (\vec{x}_N, \vec{y}_N) (x 1​,y ​1​),(x 2​,y ​2​),⋯,(x N​,y ​N​)。 y ⃗ i ∈ F m × 1 \vec{y}_i \in \mathbb{F}^{m \times 1} y ​i​∈Fm×1, y ⃗ i \vec{y}_i y ​i​只有一个元素为 1 1 1,其他都是 0 0 0, x ⃗ i ∈ F n × 1 \vec{x}_i \in \mathbb{F}^{n \times 1} x i​∈Fn×1,
W 1 ∈ F p × n W_1 \in \mathbb{F}^{p \times n} W1​∈Fp×n, W 2 ∈ F m × p W_2 \in \mathbb{F}^{m \times p} W2​∈Fm×p, b ⃗ 1 ∈ F p × 1 \vec{b}_1 \in \mathbb{F}^{p \times 1} b 1​∈Fp×1,
b ⃗ 2 ∈ F m × 1 \vec{b}_2 \in \mathbb{F}^{m \times 1} b 2​∈Fm×1, a ⃗ 1 , i = W 1 x ⃗ i + b ⃗ 1 \vec{a}_{1,i} = W_1 \vec{x}_i + \vec{b}_1 a 1,i​=W1​x i​+b 1​, h 1 , i ⃗ = σ ( a ⃗ 1 , i ) \vec{h_{1,i}} = \sigma (\vec{a}_{1,i}) h1,i​ ​=σ(a 1,i​),
a ⃗ 2 , i = W 1 h ⃗ 1 , i + b ⃗ 2 \vec{a}_{2,i} = W_1 \vec{h}_{1,i} + \vec{b}_2 a 2,i​=W1​h 1,i​+b 2​, 定义损失函数为 l = − ∑ i = 1 N y ⃗ i T log ⁡ s o f t m a x ( a ⃗ 2 , i ) l = - \sum_{i=1}^{N} \vec{y}_i^T \log softmax(\vec{a}_{2,i}) l=−∑i=1N​y ​iT​logsoftmax(a 2,i​).

【解】 先求损失对第2层输出的微分 ∂ l ∂ a ⃗ 2 , i = s o f t m a x ( a ⃗ 2 , i ) − y ⃗ i \frac{ \partial l }{ \partial \vec{a}_{2,i} } = softmax(\vec{a}_{2,i}) - \vec{y}_i ∂a 2,i​∂l​=softmax(a 2,i​)−y ​i​。
再求损失对第1层输出、连接第1-2层间的权重的微分。这里由于没有定义对矩阵求导的一些链式法则,因此使用导数与微分的关系。
d l = t r ( ∑ i = 1 N ( ∂ l ∂ a ⃗ 2 , i ) T d a ⃗ 2 , i ) = ∑ i = 1 N t r ( ( ∂ l ∂ a ⃗ 2 , i ) T d ( W 2 h ⃗ 1 , i + b ⃗ 2 ) ) = ∑ i = 1 N t r ( ( ∂ l ∂ a ⃗ 2 , i ) T d ( W 2 ) h ⃗ 1 , i ) + ∑ i = 1 N t r ( ( ∂ l ∂ a ⃗ 2 , i ) T W 2 d ( h ⃗ 1 , i ) ) + ∑ i = 1 N t r ( ( ∂ l ∂ a ⃗ 2 , i ) T d ( b ⃗ 2 ) ) = ∑ i = 1 N t r ( h ⃗ 1 , i ( ∂ l ∂ a ⃗ 2 , i ) T d ( W 2 ) ) + ∑ i = 1 N t r ( ( ∂ l ∂ a ⃗ 2 , i ) T W 2 d ( h ⃗ 1 , i ) ) + ∑ i = 1 N t r ( ( ∂ l ∂ a ⃗ 2 , i ) T d ( b ⃗ 2 ) ) . \begin{aligned} \mathrm{d} l = & tr\left( \sum_{i=1}^{N} \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \vec{a}_{2,i} \right) \\ = & \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( W_2 \vec{h}_{1,i} + \vec{b}_2 \right) \right) \\ = & \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( W_2 \right) \vec{h}_{1,i} \right) + \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T W_2 \mathrm{d} \left( \vec{h}_{1,i} \right) \right) + \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( \vec{b}_2 \right) \right) \\ = & \sum_{i=1}^{N} tr\left( \vec{h}_{1,i} \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( W_2 \right) \right) + \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T W_2 \mathrm{d} \left( \vec{h}_{1,i} \right) \right) + \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( \vec{b}_2 \right) \right) . \end{aligned} dl====​tr(i=1∑N​(∂a 2,i​∂l​)Tda 2,i​)i=1∑N​tr((∂a 2,i​∂l​)Td(W2​h 1,i​+b 2​))i=1∑N​tr((∂a 2,i​∂l​)Td(W2​)h 1,i​)+i=1∑N​tr((∂a 2,i​∂l​)TW2​d(h 1,i​))+i=1∑N​tr((∂a 2,i​∂l​)Td(b 2​))i=1∑N​tr(h 1,i​(∂a 2,i​∂l​)Td(W2​))+i=1∑N​tr((∂a 2,i​∂l​)TW2​d(h 1,i​))+i=1∑N​tr((∂a 2,i​∂l​)Td(b 2​)).​
得到 ∂ l ∂ W 2 = ∑ i = 1 N ∂ l ∂ a ⃗ 2 , i h ⃗ 1 , i T \frac{\partial l}{\partial W_2} = \sum_{i=1}^{N} \frac{ \partial l }{ \partial \vec{a}_{2,i} } \vec{h}_{1,i}^T ∂W2​∂l​=i=1∑N​∂a 2,i​∂l​h 1,iT​.
∂ l ∂ b 2 = ∑ i = 1 N ∂ l ∂ a ⃗ 2 , i . \frac{\partial l}{\partial b_2} = \sum_{i=1}^{N} \frac{ \partial l }{ \partial \vec{a}_{2,i} }. ∂b2​∂l​=i=1∑N​∂a 2,i​∂l​.
∂ l ∂ h 1 , i = W 2 T ∂ l ∂ a ⃗ 2 , i . \frac{\partial l}{\partial h_{1,i}} = W_2^T \frac{ \partial l }{ \partial \vec{a}_{2,i} }. ∂h1,i​∂l​=W2T​∂a 2,i​∂l​.
再求损失对第1层输入的微分。
∂ l ∂ a ⃗ 1 , i = ∂ l ∂ h 1 , i ⊙ σ ′ ( a ⃗ 1 , i ) . \frac{\partial l}{\partial \vec{a}_{1,i}} = \frac{\partial l}{\partial h_{1,i}} \odot \sigma^{'}(\vec{a}_{1,i}). ∂a 1,i​∂l​=∂h1,i​∂l​⊙σ′(a 1,i​).
最后再求损失对连接输入层到第1层的权重的微分。
d l = t f ( ∑ i = 1 N ( ∂ l ∂ a ⃗ 1 , i ) T d a ⃗ 1 , i ) = t f ( ∑ i = 1 N ( ∂ l ∂ a ⃗ 1 , i ) T d ( W 1 x ⃗ i + b ⃗ i ) ) = t f ( ∑ i = 1 N ( ∂ l ∂ a ⃗ 1 , i ) T d W 1 x ⃗ i ) + t f ( ∑ i = 1 N ( ∂ l ∂ a ⃗ 1 , i ) T d b ⃗ i ) = t f ( ∑ i = 1 N x ⃗ i ( ∂ l ∂ a ⃗ 1 , i ) T d W 1 ) + t f ( ∑ i = 1 N ( ∂ l ∂ a ⃗ 1 , i ) T d b ⃗ i ) \begin{aligned} \mathrm{d} l = & tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} \vec{a}_{1,i} \right) \\ = & tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} \left( W_1 \vec{x}_i + \vec{b}_i \right) \right) \\ = & tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} W_1 \vec{x}_i \right) + tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} \vec{b}_i \right) \\ = & tf \left( \sum_{i=1}^{N} \vec{x}_i \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} W_1 \right) + tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} \vec{b}_i \right) \\ \end{aligned} dl====​tf(i=1∑N​(∂a 1,i​∂l​)Tda 1,i​)tf(i=1∑N​(∂a 1,i​∂l​)Td(W1​x i​+b i​))tf(i=1∑N​(∂a 1,i​∂l​)TdW1​x i​)+tf(i=1∑N​(∂a 1,i​∂l​)Tdb i​)tf(i=1∑N​x i​(∂a 1,i​∂l​)TdW1​)+tf(i=1∑N​(∂a 1,i​∂l​)Tdb i​)​
得 ∂ l ∂ W 1 = ∑ i = 1 N ∂ l ∂ a ⃗ 1 , i x ⃗ i T \frac{\partial l}{\partial W_1} = \sum_{i=1}^{N} \frac{\partial l}{ \partial \vec{a}_{1,i}} \vec{x}_i^T ∂W1​∂l​=i=1∑N​∂a 1,i​∂l​x iT​.
∂ l ∂ b ⃗ 1 = ∑ i = 1 N ∂ l ∂ a ⃗ 1 , i . \frac{\partial l}{\partial \vec{b}_1} = \sum_{i=1}^{N} \frac{\partial l}{ \partial \vec{a}_{1,i}}. ∂b 1​∂l​=i=1∑N​∂a 1,i​∂l​.
【例8】 将上题给成矩阵形式, X = [ x ⃗ 1 , ⋯ , x ⃗ N ] X = [\vec{x}_1,\cdots,\vec{x}_N] X=[x 1​,⋯,x N​],
A 1 = [ a ⃗ 1 , 1 , ⋯ , a ⃗ 1 , N ] = W 1 X + b ⃗ 1 1 ⃗ T A_1 = [\vec{a}_{1,1},\cdots, \vec{a}_{1,N}] = W_1X+\vec{b}_1 \vec{1}^T A1​=[a 1,1​,⋯,a 1,N​]=W1​X+b 1​1 T,
H 1 = [ h ⃗ 1 , 1 , ⋯ , h ⃗ 1 , N ] = σ ( A 1 ) H_1 = [\vec{h}_{1,1},\cdots, \vec{h}_{1,N}] = \sigma(A_1) H1​=[h 1,1​,⋯,h 1,N​]=σ(A1​),
A 2 = [ a ⃗ 2 , 1 , ⋯ , a ⃗ 2 , N ] = W 2 H 1 + b ⃗ 2 1 ⃗ T A_2 = [\vec{a}_{2,1},\cdots, \vec{a}_{2,N}] = W_2H_1+\vec{b}_2 \vec{1}^T A2​=[a 2,1​,⋯,a 2,N​]=W2​H1​+b 2​1 T.

【解】 先求损失对第2层输出的微分 ∂ l ∂ A 2 = [ s o f t m a x ( a 2 , 1 ⃗ ) − y ⃗ 1 , ⋯ , s o f t m a x ( a 2 , N ⃗ ) − y ⃗ N ] \frac{\partial l}{\partial A_2} = [softmax(\vec{a_{2,1}}) - \vec{y}_1, \cdots, softmax(\vec{a_{2,N}}) - \vec{y}_N] ∂A2​∂l​=[softmax(a2,1​ ​)−y ​1​,⋯,softmax(a2,N​ ​)−y ​N​]。
再求损失对第1层输出、连接第1-2层间的权重的微分。这里由于没有定义对矩阵求导的一些链式法则,因此使用导数与微分的关系。
d l = t f ( ( ∂ l ∂ A 2 ) T d A 2 ) = t f ( ( ∂ l ∂ A 2 ) T d ( W 2 H 1 + b ⃗ 2 1 ⃗ T ) ) = t f ( H 1 ( ∂ l ∂ A 2 ) T d ( W 2 ) ) + t f ( ( ∂ l ∂ A 2 ) T W 2 d ( H 1 ) ) + t f ( ( ∂ l ∂ A 2 1 ⃗ ) T d b ⃗ 2 ) . \begin{aligned} \mathrm{d} l = & tf \left( \left( \frac{\partial l}{\partial A_2} \right)^T \mathrm{d} A_2 \right) \\ = & tf \left( \left( \frac{\partial l}{\partial A_2} \right)^T \mathrm{d} \left( W_2H_1+\vec{b}_2 \vec{1}^T \right) \right) \\ = & tf \left( H_1 \left( \frac{\partial l}{\partial A_2} \right)^T \mathrm{d} (W_2) \right) + tf \left( \left( \frac{\partial l}{\partial A_2} \right)^T W_2 \mathrm{d} (H_1) \right) + tf \left( \left( \frac{\partial l}{\partial A_2} \vec{1} \right)^T \mathrm{d} \vec{b}_2 \right). \end{aligned} dl===​tf((∂A2​∂l​)TdA2​)tf((∂A2​∂l​)Td(W2​H1​+b 2​1 T))tf(H1​(∂A2​∂l​)Td(W2​))+tf((∂A2​∂l​)TW2​d(H1​))+tf((∂A2​∂l​1 )Tdb 2​).​

∂ l ∂ W 2 = ∂ l ∂ A 2 H 1 T \frac{\partial l}{\partial W_2} = \frac{\partial l}{\partial A_2} H_1^T ∂W2​∂l​=∂A2​∂l​H1T​.
∂ l ∂ H 1 = W 2 T ∂ l ∂ A 2 \frac{\partial l}{\partial H_1} = W_2^T \frac{\partial l}{\partial A_2} ∂H1​∂l​=W2T​∂A2​∂l​.
∂ l ∂ b ⃗ 2 = ∂ l ∂ A 2 1 ⃗ \frac{\partial l}{\partial \vec{b}_2} = \frac{\partial l}{\partial A_2} \vec{1} ∂b 2​∂l​=∂A2​∂l​1 .
再求损失对第1层输入的微分。
∂ l ∂ A 1 = ∂ l ∂ H 1 ⊙ σ ′ ( A 1 ) \frac{\partial l}{\partial A_1} = \frac{\partial l}{\partial H_1} \odot \sigma^{'}(A_1) ∂A1​∂l​=∂H1​∂l​⊙σ′(A1​).
再求损失对第1层输出、连接第1-2层间的权重的微分。
d l = t f ( ( ∂ l ∂ A 1 ) T d A 1 ) = t f ( ( ∂ l ∂ A 1 ) T d ( W 1 X + b ⃗ 1 1 ⃗ T ) ) = t f ( ( ∂ l ∂ A 1 ) T ( d W 1 ) X ) + t f ( ( ∂ l ∂ A 1 ) T W 1 ( d X ) ) + t f ( ( ∂ l ∂ A 1 ) T ( d b ⃗ 1 ) 1 ⃗ T ) = t f ( X ( ∂ l ∂ A 1 ) T ( d W 1 ) ) + t f ( ( ∂ l ∂ A 1 ) T W 1 ( d X ) ) + t f ( 1 ⃗ T ( ∂ l ∂ A 1 ) T ( d b ⃗ 1 ) ) . \begin{aligned} \mathrm{d} l = & tf\left( \left( \frac{\partial l}{\partial A_1} \right)^T \mathrm{d} A_1 \right) \\ = & tf\left( \left( \frac{\partial l}{\partial A_1} \right)^T \mathrm{d} \left( W_1X+\vec{b}_1 \vec{1}^T \right) \right) \\ = & tf\left( \left( \frac{\partial l}{\partial A_1} \right)^T (\mathrm{d} W_1) X \right) + tf \left( \left( \frac{\partial l}{\partial A_1} \right)^T W_1 (\mathrm{d} X) \right) + tf\left( \left( \frac{\partial l}{\partial A_1} \right)^T ( \mathrm{d} \vec{b}_1 ) \vec{1}^T \right) \\ = & tf\left( X \left( \frac{\partial l}{\partial A_1} \right)^T (\mathrm{d} W_1) \right) + tf \left( \left( \frac{\partial l}{\partial A_1} \right)^T W_1 (\mathrm{d} X) \right) + tf\left( \vec{1}^T \left( \frac{\partial l}{\partial A_1} \right)^T ( \mathrm{d} \vec{b}_1 ) \right). \end{aligned} dl====​tf((∂A1​∂l​)TdA1​)tf((∂A1​∂l​)Td(W1​X+b 1​1 T))tf((∂A1​∂l​)T(dW1​)X)+tf((∂A1​∂l​)TW1​(dX))+tf((∂A1​∂l​)T(db 1​)1 T)tf(X(∂A1​∂l​)T(dW1​))+tf((∂A1​∂l​)TW1​(dX))+tf(1 T(∂A1​∂l​)T(db 1​)).​
得 ∂ l ∂ W 1 = ∂ l ∂ A 1 X T . \frac{\partial l}{\partial W_1} = \frac{\partial l}{\partial A_1} X^T. ∂W1​∂l​=∂A1​∂l​XT.
∂ l ∂ b ⃗ 1 = ∂ l ∂ A 1 1 ⃗ . \frac{\partial l}{\partial \vec{b}_1} = \frac{\partial l}{\partial A_1} \vec{1}. ∂b 1​∂l​=∂A1​∂l​1 .

参考文献

[1] KHENG L W. Matrix differentiation,cs5240 theoretical foundations in multimedia.
[2] 张贤达. 矩阵分析与应用[M]. 北京: 清华大学出版社, 2004: 255-285.
[3] 长躯鬼侠. 矩阵求导术(上).

更多推荐

矩阵求导笔记

本文发布于:2024-02-07 05:35:39,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1753718.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:求导   矩阵   笔记

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!