gmm

Syntax

gmm(X, k, [maxIter=300], [tolerance=1e-4], [randomSeed], [mean], [sigma])

Arguments

X is the training data set. For univariate data, X is a vector; For multivariate data, X is a matrix/table where each column is a sample.

k is an integer indicating the number of independent Gaussians in a mixture model.

maxlter is a positive integer indicating the maximum EM iterations to perform. The default value is 300.

tolerance is a floating-point number indicating the convergence tolerance. EM iterations will stop when the lower bound average gain is below this threshold. The default value is 1e-4.

randomSeed is the random seed given to the method.

mean is an optional parameter. It is a vector or matrix indicating the initial means.

  • For univariate data, it is a vector of length k;

  • For multivariate data, it is a matrix whose number of columns is k and number of rows is the same as the number of variables in X;

  • If mean is unspecified, k values are randomly selected from X as the initial means.

sigma an optional parameter. It is

  • a vector, indicating the initialized variance of each submodel if X is univariate data;

  • a tuple of length k, indicating the covariance matrix of each submodel if X is multivariate data;

  • a vector with element values of 1 or an identity matrix if sigma is unspecified.

Details

Train the Gaussian Mixture Model (GMM) with the given data set. Return a dictionary with the following keys:

  • modelName: a string “Gaussian Mixture Model”

  • prior: the prior probability of each submodel

  • mean: the expectation of each submodel

  • sigma: If X is univariate data, it represents the variance of each submodel; If X is multivariate data, it represents the covariance matrix of each submodel.

Examples

$ dataT = 6.8 7.2 5.3 9.4 6.5 11.2 25.6 0.6 8.9 4.3 2.2 1.9 8.7 0.2 1.5
$ mean = [2, 2]
$ re = gmm(dataT, 2, 300, 1e-4, 42, mean)
$ re

sigma->[36.759822,36.759822]
modelName->Gaussian Mixture Model
prior->[0.5,0.5]
mean->[6.686667,6.686667]

$ dataT = transpose(matrix(3.2 1.5 2.6 7.8 6.3 4.2 5.1 8.9 11.2 25.8, 25.6 4.6 8.9 4.3 2.2 1.9 8.7 0.2 1.5 9.3))
$ mean = transpose(matrix([1, 0], [0, 1]))
$ re = gmm(dataT, 2, 300, 1e-4, 42, mean)
$ re

sigma->(#0        #1
51.001369 18.273032
18.273032 9.34789
,#0       #1
1.718475 0.629584
0.629584 67.713701
)
modelName->Gaussian Mixture Model
prior->[0.558683,0.441317]
mean->
#0        #1
11.152841 3.238262
3.341493  10.996997