Posts in Matrix calculus
Learning Machine Learning 3 - Some matrix calculus, least squares and Lagrange multipliers

From vector calculus we're familiar with the notion of the gradient of a function $$f:\RR^n\to\RR$$, $$\nabla f$$. Assuming cartesian coordinates it is the vector of partial derivatives, $$(\partial f/\partial x_1,\partial f/\partial x_2,\dots,\partial f/\partial x_n)^\top$$, and points in the direction of the greatest rate of change of $$f$$. If we think of $$f$$ as a real-valued function of real $$m\times n$$ matrices, $$f:\text{Mat}_{m,n}(\RR)\to\RR$$, then $$\nabla f$$ is an $$m\times n$$ matrix,\begin{equation*}\nabla f=\begin{pmatrix}\frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{12}}&\cdots&\frac{\partial f}{\partial x_{1n}}\\\frac{\partial f}{\partial x_{21}}&\frac{\partial f}{\partial x_{22}}&\cdots&\frac{\partial f}{\partial x_{2n}}\\\vdots&\vdots&\ddots&\vdots\\\frac{\partial f}{\partial x_{m1}}&\frac{\partial f}{\partial x_{m2}}&\cdots&\frac{\partial f}{\partial x_{mn}}\end{pmatrix}.\end{equation*}