Pseudo-Centroid Clustering

July 12, 2016 · Declared Dead · 🏛 Soft Computing - A Fusion of Foundations, Methodologies and Applications

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Fred Glover arXiv ID 1607.03467 Category cs.DS: Data Structures & Algorithms Citations 10 Venue Soft Computing - A Fusion of Foundations, Methodologies and Applications Last Checked 4 months ago

Abstract

Pseudo-Centroid Clustering replaces the traditional concept of a centroid expressed as a center of gravity with the notion of a pseudo-centroid (or a coordinate free centroid) which has the advantage of applying to clustering problems where points do not have numerical coordinates (or categorical coordinates that are translated into numerical form). Such problems, for which classical centroids do not exist, are particularly important in social sciences, marketing, psychology and economics, where distances are not computed from vector coordinates but rather are expressed in terms of characteristics such as affinity relationships, psychological preferences, advertising responses, polling data, market interactions and so forth, where distances, broadly conceived, measure the similarity (or dissimilarity) of characteristics, functions or structures. We formulate a K-PC algorithm analogous to a K-Means algorithm, and identify two key types of pseudo-centroids, MinMax centroids and (weighted) MinSum centroids, and describe how they respectively give rise to a K-MinMax algorithm and a K-MinSum algorithm which are analogous to a K-Means algorithm. The K-PC algorithms are able to take advantage of problem structure to identify special diversity-based and intensity-based starting methods to generate initial pseudo-centroids and associated clusters, accompanied by theorems for the intensity-based methods that establish their ability to obtain best clusters of a selected size from the points available at each stage of construction. We also introduce a Regret-Threshold PC algorithm that modifies the K-PC algorithm together with an associated diversification method and a new criterion for evaluating the quality of a collection of clusters.