Saving Australian Maths

A guest blog post by Philip O’Carroll, co founder of Fitzroy Community School and an Institute for Enquiring Minds volunteer mentor.

Australia is a top spender on school education. But UNICEF ranks Australia educationally 39th out of 41 high‐ to middle‐income countries[1]. I know how to solve this problem. At no extra cost.

Andrew Jacobs
Learning Machine Learning 4 - Linear regression, gradient descent and feature normalization

The CS229 course kicks off with Andrew Ng introducing some data which will be used to illustrate different algorithms. If you sign up for his Coursera course, this data is provided as part of one of the early programming assignments. I've never used MatLab before but it proved really straightforward to load the data, convert the table to an array, extract the vectors of data and produce this plot of prices against living area:

Andrew Jacobs
Learning Machine Learning 3 - Some matrix calculus, least squares and Lagrange multipliers

From vector calculus we're familiar with the notion of the gradient of a function $$f:\RR^n\to\RR$$, $$\nabla f$$. Assuming cartesian coordinates it is the vector of partial derivatives, $$(\partial f/\partial x_1,\partial f/\partial x_2,\dots,\partial f/\partial x_n)^\top$$, and points in the direction of the greatest rate of change of $$f$$. If we think of $$f$$ as a real-valued function of real $$m\times n$$ matrices, $$f:\text{Mat}_{m,n}(\RR)\to\RR$$, then $$\nabla f$$ is an $$m\times n$$ matrix,\begin{equation*}\nabla f=\begin{pmatrix}\frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{12}}&\cdots&\frac{\partial f}{\partial x_{1n}}\\\frac{\partial f}{\partial x_{21}}&\frac{\partial f}{\partial x_{22}}&\cdots&\frac{\partial f}{\partial x_{2n}}\\\vdots&\vdots&\ddots&\vdots\\\frac{\partial f}{\partial x_{m1}}&\frac{\partial f}{\partial x_{m2}}&\cdots&\frac{\partial f}{\partial x_{mn}}\end{pmatrix}.\end{equation*}

Andrew Jacobs
Learning Machine Learning 2 - The multivariate Gaussian

If a $$N$$-dimensional random vector $$\mathbf{X}=(X_1,X_2,\dots,X_N)^\top$$ is distributed according to a multivariate Gaussian distribution with mean vector \begin{equation*}\boldsymbol{\mu}=(\Exp[X_1],\Exp[X_2],\dots,\Exp[X_N])^\top\end{equation*} and covariance matrix $$\mathbf{\Sigma}$$ such that $$\Sigma_{ij}=\cov(X_i,X_j)$$ we write $$\mathbf{X}\sim\mathcal{N}(\boldsymbol{\mu},\mathbf{\Sigma})$$ and the probability density function is given by \begin{equation*}p(\mathbf{x}|\boldsymbol{\mu},\mathbf{\Sigma})=\frac{1}{(2\pi)^{N/2}\sqrt{\det\mathbf{\Sigma}}}\exp\left(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^\top\mathbf{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})\right).\end{equation*}

Andrew Jacobs
Learning Machine Learning 1 - Some probability and stats

By way of "warming up", in this and the next two posts we'll review some of the foundational material (mostly from probability and stats) we'll need. This review is in no way comprehensive. On the contrary it is highly selective and brief.

Bayes' theorem

Bayes' theorem is absolutely critical so let's kick off by recalling it, \begin{equation*}P(A|B)=\frac{P(B|A)P(A)}{P(B)}.\end{equation*}