Sebastian Sciarra

The Theory of the Singular Value Decomposition and its Relation to Principal Component Analysis

This whitepaper explicates the geometry of the singular value decomposition and its deep connection to principal component analysis. Geometrically, the singular value decomposition shows that every matrix can be sequentially decomposed into a rotation followed by stretching and/or dimension change and then followed by another rotation. At a deeper level, the singular value decomposition provides eigenvectors that define principal axes onto which original variables and scores can be projected to, respectively, provide loadings and principal component scores. Using rotations, the loadings and principal component scores can be transformed into having more interpretable values.

Published
22 February 2025

The Game of Supervised Machine Learning: Understanding the Setup, Players, and Rules

Supervised machine learning can be conceptualized as a game where the goal is to maximize predictive power. In this paper, I use the excess risk decomposition framework to provide a setup for the game of supervised machine learning. I then decompose mean squared error loss into bias (squared), variance, and noise and introduce these components as 'players' of the game. The bias-variance tradeoff is then used to explain the behaviours of bias (squared), variance, and noise and to synthesize two rules for the game of machine learning. The paper ends with a discussion of boundary conditions on these rules of supervised machine learning. Given the considerable amount of Python code in this post, I created the smltheory package to help readers navigate the code and facilitate their learning of concepts.

Published
10 August 2023

The Expectation-Maximization Algorithm: A Method for Modelling Mixtures of Distributions

The expectation-maximization (EM) algorithm provides a method for modelling mixtures of distributions. To explain the EM algorithm, I do so in the context of a coin-flipping example where, for each flip, one of two coins are used, and so a mixture of binomial distributions underlie the data. Although maximum likelihood estimation cannot estimate mixture models, the EM algorithm can because it optimizes the likelihood function indirectly. In the expectation (E) step, a lower-bounding function is used to obtain responsibilities. In the maximization step (M), the lower-bounding function is optimized, with the responsibilities being used to obtain new parameter estimates. By optimizing the lower-bounding function, likelihood function increases by at least as much as the lower-bounding function, thus necessitating another E step. The E and M steps repeat until the parameter values stop updating.

Published
28 April 2023

Probability, Likelihood, and Maximum Likelihood Estimation

Probability and likelihood are discussed in the context of a coin flipping scenario and it is shown that only probabilities sum to one. Although likelihoods cannot be interpreted as probabilities, they can be used to determine the set of parameter values that most likely produced a data set (maximum likelihood estimates). Maximum likelihood estimation provides one efficient method for determining maximum likelihood estimates and is applied in the binomial and Gaussian cases.

Published
19 March 2023

Whitepapers Directory

The Theory of the Singular Value Decomposition and its Relation to Principal Component Analysis

The Game of Supervised Machine Learning: Understanding the Setup, Players, and Rules

Coding and Visualizing the Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm: A Method for Modelling Mixtures of Distributions

Probability, Likelihood, and Maximum Likelihood Estimation