多元实高斯分布对协方差的偏导

假设$N$维实随机矢量$\mathbf{x}$服从均值为$\mathbf{a}$,协方差为$\mathbf{A}$的高斯分布,记作$\mathbf{x}\sim \mathcal{N}(\mathbf{x}|\mathbf{a},\mathbf{A})$
\begin{align}
\mathcal{N}\left({\mathbf{x}|\mathbf{a},\mathbf{A} }\right)
=(2\pi)^{-\frac{N}{2} }|\mathbf{A}|^{-\frac{1}{2} }\exp \left({-\frac{1}{2}(\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})}\right)
\end{align}
其对协方差矩阵的偏导为
\begin{align}
\frac{\partial \mathcal{N}(\mathbf{x}|\mathbf{a},\mathbf{A})}{\partial \mathbf{A} }&=(2\pi)^{-\frac{N}{2} }\frac{\partial |\mathbf{A}|^{-\frac{1}{2} } }{\partial \mathbf{A} }\exp \left(-\frac{1}{2}(\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})\right)\\
&\quad +(2\pi)^{-\frac{N}{2} }|\mathbf{A}|^{-\frac{1}{2} }\frac{\partial }{\partial \mathbf{A} }\exp \left[-\frac{1}{2}(\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})\right]\\
&\overset{(a)}{=}-\frac{1}{2}\mathbf{A}^{-1}\mathcal{N}(\mathbf{x}|\mathbf{a},\mathbf{A})+\frac{1}{2}\boldsymbol{A}^{-1}(\mathbf{x}-\mathbf{a})(\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}\mathcal{N}(\mathbf{x}|\mathbf{a},\mathbf{A})
\end{align}
这里最为主要的是偏导$\frac{\partial |\mathbf{A}|^{-\frac{1}{2} } }{\partial \mathbf{A} }$和$\frac{\partial }{\partial \mathbf{A} }\exp \left({-\frac{1}{2}(\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})}\right)$的计算。我们给出其详细计算过程如下
\begin{align}
\frac{\partial |\mathbf{A}|^{-\frac{1}{2} } }{\partial \mathbf{A} }=-\frac{1}{2}|\mathbf{A}|^{-\frac{3}{2} } (|\mathbf{A}|\mathbf{A}^{-1})=-\frac{1}{2}|\mathbf{A}|^{-\frac{1}{2} }\mathbf{A}^{-1}
\end{align}
这里利用到偏导数公式$\frac{\partial |\mathbf{A}|}{\partial \mathbf{A} }=|\mathbf{A}|\mathbf{A}^{-1}$。

另外
\begin{align}
&\quad \frac{\partial }{\partial \mathbf{A} }\exp \left[-\frac{1}{2}(\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})\right]\\
&=\exp\left(-\frac{1}{2}(\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})\right)\left(-\frac{1}{2}\frac{\partial }{\partial \mathbf{A} }(\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})\right)
\end{align}
其中
\begin{align}
\frac{\partial (\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})}{\partial \mathbf{A} }=\left({
\begin{matrix}
\frac{\partial (\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})}{\partial A_{11} }& \cdots& \frac{\partial (\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})}{A_{1N} }\\
\vdots& \ddots& \vdots\\
\frac{\partial (\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})}{\partial A_{N1} }& \cdots& \frac{\partial (\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})}{\partial A_{NN} }
\end{matrix}
}\right)
\end{align}
计算该矩阵元素如下
\begin{align}
\frac{\partial (\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})}{\partial A_{ij} }
&=\text{tr}\left\{ {\frac{\partial (\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})}{\partial \mathbf{A}^{-1} }\frac{\partial \mathbf{A}^{-1} }{\partial A_{ij} } }\right\}\\
&\overset{(b)}{=}\text{tr}\left\{ {-(\mathbf{x}-\mathbf{a})(\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}\frac{\partial \mathbf{A} }{\partial A_{ij} }\mathbf{A}^{-1} }\right\}\\
&=\text{tr}\left\{ {-(\mathbf{x}-\mathbf{a})(\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}\boldsymbol{e}_i\boldsymbol{e}_j^T\mathbf{A}^{-1} }\right\}\\
&=-\boldsymbol{e}_j^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})(\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}\boldsymbol{e}_i
\end{align}
这里步骤$(b)$利用到偏导公式$\frac{\partial g(\mathbf{U})}{\partial x}=\text{tr}\left\{ {\frac{\partial g(\mathbf{U})}{\partial \mathbf{U} }\frac{\partial \mathbf{U} }{\partial x} }\right\}$以及$\frac{\partial \mathbf{U}^{-1} }{\partial x}=-\mathbf{U}^{-1}\frac{\partial \mathbf{U} }{\partial x}\mathbf{U}^{-1}$。因此,可以得到
\begin{align}
\frac{\partial (\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})}{\partial \mathbf{A} }=-(\boldsymbol{A}^{-1})^T(\mathbf{x}-\mathbf{a})(\mathbf{x}-\mathbf{a})^T(\boldsymbol{A}^{-1})^T
\end{align}
若假设$\mathbf{A}$是对称矩阵, 则有
\begin{align}
\frac{\partial (\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}(\mathbf{x}-\mathbf{a})}{\partial \mathbf{A} }=-\boldsymbol{A}^{-1}(\mathbf{x}-\mathbf{a})(\mathbf{x}-\mathbf{a})^T\mathbf{A}^{-1}
\end{align}