Posts in Machine Learning
Learning Machine Learning 3 - Some matrix calculus, least squares and Lagrange multipliers

From vector calculus we're familiar with the notion of the gradient of a function \(f:\RR^n\to\RR\), \(\nabla f\). Assuming cartesian coordinates it is the vector of partial derivatives, \((\partial f/\partial x_1,\partial f/\partial x_2,\dots,\partial f/\partial x_n)^\top\), and points in the direction of the greatest rate of change of \(f\). If we think of \(f\) as a real-valued function of real \(m\times n\) matrices, \(f:\text{Mat}_{m,n}(\RR)\to\RR\), then \(\nabla f\) is an \(m\times n\) matrix,\begin{equation*}\nabla f=\begin{pmatrix}\frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{12}}&\cdots&\frac{\partial f}{\partial x_{1n}}\\\frac{\partial f}{\partial x_{21}}&\frac{\partial f}{\partial x_{22}}&\cdots&\frac{\partial f}{\partial x_{2n}}\\\vdots&\vdots&\ddots&\vdots\\\frac{\partial f}{\partial x_{m1}}&\frac{\partial f}{\partial x_{m2}}&\cdots&\frac{\partial f}{\partial x_{mn}}\end{pmatrix}.\end{equation*}

Read More
Learning Machine Learning 2 - The multivariate Gaussian

If a \(N\)-dimensional random vector \(\mathbf{X}=(X_1,X_2,\dots,X_N)^\top\) is distributed according to a multivariate Gaussian distribution with mean vector \begin{equation*}\boldsymbol{\mu}=(\Exp[X_1],\Exp[X_2],\dots,\Exp[X_N])^\top\end{equation*} and covariance matrix \(\mathbf{\Sigma}\) such that \(\Sigma_{ij}=\cov(X_i,X_j)\) we write \(\mathbf{X}\sim\mathcal{N}(\boldsymbol{\mu},\mathbf{\Sigma})\) and the probability density function is given by \begin{equation*}p(\mathbf{x}|\boldsymbol{\mu},\mathbf{\Sigma})=\frac{1}{(2\pi)^{N/2}\sqrt{\det\mathbf{\Sigma}}}\exp\left(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^\top\mathbf{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})\right).\end{equation*}

Read More
Learning Machine Learning 1 - Some probability and stats

By way of "warming up", in this and the next two posts we'll review some of the foundational material (mostly from probability and stats) we'll need. This review is in no way comprehensive. On the contrary it is highly selective and brief. 

Bayes' theorem

Bayes' theorem is absolutely critical so let's kick off by recalling it, \begin{equation*}P(A|B)=\frac{P(B|A)P(A)}{P(B)}.\end{equation*}

Read More