K-means clustering

Suppose we have a data set {x1,,xn} consisting of N observations of a random D dimensional space, the goal is to partition the data set into some number K of clusters, formally let {μμ1,,μμk} be a set of D dimensional vectors in which μμk is associated with the kth cluster (μμk can be thought as the centres of the clusters). The goal is to find an assignment of data points so that the distance of each data point to its closest vector μμk is a minimum.

Let rnk{0,1} where k=1,K describing the assignment of each data point to a cluster (1 if it’s assigned to a cluster and 0 if not), we define a function called distortion measure given by

(1)J=n=1Nk=1Krnkxnμμk2

Which represent the sum of the squares of the distances of each data point to its assigned vector μμk, our goal is to find values for rnk and the μμk so as to minimize J, the algorithm is as follows:

Algorithm:

  • pick initial values for the μμ
  • minimize J with respect to rnk keeping the μμk fixed (Expectation)
  • minimize J with respect to the μμk keeping rnk fixed (Maximization)

Multivariate gaussian distribution

For a random variable X with a finite number of outcomes x1,x2,,xn occurring with probabilities p1,p2,,pn the expectation of X is defined as:

E[X]=i=1Nxipi

The covariance between two variables X,Y is defined as the expected value (or mean) of the product of their deviations from their individual expected values

cov(X,Y)=E[(XE[X])(YE[Y])]

When working with multiple variables X1,X2,Xn the covariance matrix denoted as Σ is the n×n matrix whose (i,j)th entry is cov(Xi,Xj)

The density function of a univariate gaussian distribution is given by:

p(x;μ,σ)=12πσ2exp(12σ2(xμ)2)
  • (xμ)2 is always positive
  • the value k(x,μ)=12σ2(xμ)2 is a always negative, it’s a parabola pointing downward
  • the exp(k(x,μ)) part makes sure that the quantity is always >= 0
  • the normalization factor 12πσ2 multiples exp(k(x,μ)) so that this sum equals 1
12πσ2normalization factorexp(12σ2(xμ)2)=1

A vector random variable X=[X1,,Xn]T is said to have a multivariate gaussian distribution with mean μRn and covariance matrix Σ if its probability density function is given by

p(x;μ,Σ)=1(2π)n/2|Σ|1/2exp(12(xμ)TΣ1(xμ))

Like in the univariate case the argument of the exponential function is a downward opening bowl, the coefficient in front is a normalization factor used to ensure that

1(2π)n/2|Σ|1/2normalization factorexp(12(xμ)TΣ1(xμ))dx1dx2dxn=1

Gaussian mixture models and EM

https://www.youtube.com/watch?v=qMTuMa86NzU