Bayesian K-Means

Bayesian k-means is a clustering algorithm, which searches the optimal number of clusters.

Bayesian k-means belongs to a class of “maximization expectation” (ME) algorithm, which maximize over hidden variables but marginalize over parameters. ME algorithm allows Bayesian k-means to utilize efficient data structures and to select the optimal model structure.

How to Use

* This progam requires Matlab.

Run Matlab.

cd where you un-archived the source.

Run bkm_sm or bkm_bu.

bkm_sm and bkm_bu are top-down and bottom-up Bayesian k-means algorithms respectively.


>> k = bkm_sm(DATA)
>> [z,k] = bkm_bu(DATA) 

where DATA is a DxN array where D is the dimension and N is the #datapoints.

k is an expected number of clusters.

z is a matrix which is used as an argument of `dendrogram’.

You can also run them with an option 0 like

>> bkm_sm(DATA,0)
>> bkm_bu(DATA,0) 

In this case, bkm_sm and bkm_bu do not use either kd-trees or conga-lines. Currently, bkm_bu(DATA,0) does not return a valid z. Use bkm_bu(DATA,1) instead.



This software is distributed under the BSD license.

Copyright (C) 2005 Kenichi Kurihara


Bayesian k-means : 10/25 2005


Max Welling and Kenichi Kurihara, Bayesian K-Means as a “Maximization-Expectation” Algorithm, short version accepted in SIAM conference on Data Mining SDM06 , 2005.


This material is based upon work supported by the National Science Foundation under Grant No. 0447903. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).