The minimum sum-of-squares clustering problem (MSSC) consists of partitioning
n observations into k clusters in order to minimize the sum of squared distances
from the points to the centroid of their cluster. In this paper, we propose an exact
algorithm for the MSSC problem based on the branch-and-bound technique. The lower
bound is computed by using a cutting-plane procedure in which valid inequalities are
iteratively added to the Peng–Wei semidefinite programming (SDP) relaxation. The
upper bound is computed with the constrained version of k-means in which the initial
centroids are extracted from the solution of the SDP relaxation. In the branch-and bound
procedure, we incorporate instance-level must-link and cannot-link constraints
to express knowledge about which data points should or should not be grouped
together. We manage to reduce the size of the problem at each level, preserving the
structure of the SDP problem itself. To the best of our knowledge, the obtained results
show that the approach allows us to successfully solve, for the first time, real-world
instances up to 4,000 data points.
Dettaglio pubblicazione
2022, INFORMS JOURNAL ON COMPUTING, Pages -
{SOS}-{SDP}: An Exact Solver for Minimum Sum-of-Squares Clustering (01a Articolo in rivista)
Piccialli Veronica, Sudoso Antonio M., Wiegele Angelika
keywords