K-means clustering

Suppose we have a dataset {\textbf{x}_1, \ldots, \textbf{x}_n} consisting of N observations of a random D-dimensional space. The goal is to partition the dataset into some number K of clusters. Formally, let { \pmb{\mu} _1, \ldots, \pmb{\mu}_k } be a set of D-dimensional vectors in which μμk is associated with the kth cluster (μμk can be thought of as the centers of the clusters). The goal is to find an assignment of data points so that the distance of each data point to its closest vector μμk is a minimum.

Let rnk0,1, where k=1,,K, describe the assignment of each data point to a cluster (1 if it’s assigned to a cluster and 0 if not). We define a function called the distortion measure, given by:

(1)J=n=1Nk=1Krnkxnμμk2

This represents the sum of the squares of the distances of each data point to its assigned vector μμk. Our goal is to find values for rnk and the μμk so as to minimize J. The algorithm is as follows:

Algorithm:

  • Pick initial values for the μμ.
  • Minimize J with respect to rnk, keeping the μμk fixed (Expectation).
  • Minimize J with respect to the μμk, keeping rnk fixed (Maximization).

Multivariate gaussian distribution

For a random variable X with a finite number of outcomes x1,x2,,xn occurring with probabilities p1,p2,,pn, the expectation of X is defined as:

E[X]=i=1Nxipi

The covariance between two variables, X and Y, is defined as the expected value (or mean) of the product of their deviations from their individual expected values:

cov(X,Y)=E[(XE[X])(YE[Y])]

When working with multiple variables X1,X2,,Xn, the covariance matrix, denoted as Σ, is the n×n matrix whose (i,j)-th entry is cov(Xi,Xj).

The density function of a univariate gaussian distribution is given by:

p(x;μ,σ)=12πσ2exp(12σ2(xμ)2)
  • (xμ)2 is always positive.
  • The value k(x,μ)=12σ2(xμ)2 is always negative. It’s a parabola pointing downward.
  • The exp(k(x,μ)) part makes sure that the quantity is always 0.
  • The normalization factor 12πσ2 multiplies exp(k(x,μ)) so that this sum equals 1.
12πσ2normalization factorexp(12σ2(xμ)2)=1

A vector random variable X=[X1,,Xn]T is said to have a multivariate Gaussian distribution with mean μRn and covariance matrix Σ if its probability density function is given by:

p(x;μ,Σ)=1(2π)n/2|Σ|1/2exp(12(xμ)TΣ1(xμ))

Like in the univariate case, the argument of the exponential function is a downward-opening bowl. The coefficient in front is a normalization factor used to ensure that:

1(2π)n/2|Σ|1/2normalization factorexp(12(xμ)TΣ1(xμ))dx1dx2dxn=1

Gaussian mixture models and EM

https://www.youtube.com/watch?v=qMTuMa86NzU