Posts in Matrix calculus
Learning Machine Learning 3 - Some matrix calculus, least squares and Lagrange multipliers

From vector calculus we're familiar with the notion of the gradient of a function \(f:\RR^n\to\RR\), \(\nabla f\). Assuming cartesian coordinates it is the vector of partial derivatives, \((\partial f/\partial x_1,\partial f/\partial x_2,\dots,\partial f/\partial x_n)^\top\), and points in the direction of the greatest rate of change of \(f\). If we think of \(f\) as a real-valued function of real \(m\times n\) matrices, \(f:\text{Mat}_{m,n}(\RR)\to\RR\), then \(\nabla f\) is an \(m\times n\) matrix,\begin{equation*}\nabla f=\begin{pmatrix}\frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{12}}&\cdots&\frac{\partial f}{\partial x_{1n}}\\\frac{\partial f}{\partial x_{21}}&\frac{\partial f}{\partial x_{22}}&\cdots&\frac{\partial f}{\partial x_{2n}}\\\vdots&\vdots&\ddots&\vdots\\\frac{\partial f}{\partial x_{m1}}&\frac{\partial f}{\partial x_{m2}}&\cdots&\frac{\partial f}{\partial x_{mn}}\end{pmatrix}.\end{equation*}

Read More