The Theory of the Singular Value Decomposition and its Relation to Principal Component Analysis

This whitepaper explicates the geometry of the singular value decomposition and its deep connection to principal component analysis. Geometrically, the singular value decomposition shows that every matrix can be sequentially decomposed into a rotation followed by stretching and/or dimension change and then followed by another rotation. At a deeper level, the singular value decomposition provides eigenvectors that define principal axes onto which original variables and scores can be projected to, respectively, provide loadings and principal component scores. Using rotations, the loadings and principal component scores can be transformed into having more interpretable values.

The Game of Supervised Machine Learning: Understanding the Setup, Players, and Rules

Supervised machine learning can be conceptualized as a game where the goal is to maximize predictive power. In this paper, I use the excess risk decomposition framework to provide a setup for the game of supervised machine learning. I then decompose mean squared error loss into bias (squared), variance, and noise and introduce these components as 'players' of the game. The bias-variance tradeoff is then used to explain the behaviours of bias (squared), variance, and noise and to synthesize two rules for the game of machine learning. The paper ends with a discussion of boundary conditions on these rules of supervised machine learning. Given the considerable amount of Python code in this post, I created the smltheory package to help readers navigate the code and facilitate their learning of concepts.

Coding and Visualizing the Expectation-Maximization Algorithm

Two demonstrations are provided in this post. The first demonstration uses Python and R code to reproduce a coin-flipping example in a popular publication on the expectation-maximization (EM) algorithm. The second demonstration uses Python and R code to reproduce a population depiction of the EM algorithm whereby a likelihood function is approximated and optimized by a lower-bounding function.

The Expectation-Maximization Algorithm: A Method for Modelling Mixtures of Distributions

The expectation-maximization (EM) algorithm provides a method for modelling mixtures of distributions. To explain the EM algorithm, I do so in the context of a coin-flipping example where, for each flip, one of two coins are used, and so a mixture of binomial distributions underlie the data. Although maximum likelihood estimation cannot estimate mixture models, the EM algorithm can because it optimizes the likelihood function indirectly. In the expectation (E) step, a lower-bounding function is used to obtain responsibilities. In the maximization step (M), the lower-bounding function is optimized, with the responsibilities being used to obtain new parameter estimates. By optimizing the lower-bounding function, likelihood function increases by at least as much as the lower-bounding function, thus necessitating another E step. The E and M steps repeat until the parameter values stop updating.

Probability, Likelihood, and Maximum Likelihood Estimation

Probability and likelihood are discussed in the context of a coin flipping scenario and it is shown that only probabilities sum to one. Although likelihoods cannot be interpreted as probabilities, they can be used to determine the set of parameter values that most likely produced a data set (maximum likelihood estimates). Maximum likelihood estimation provides one efficient method for determining maximum likelihood estimates and is applied in the binomial and Gaussian cases.