eLearning Archive: A Primer on Clustering: II. Tendency Assessment and Cluster Validity

  • Online

This course is part of our eLearning Archive, which includes older courses that may not be current or as user-friendly as courses designed more recently.

This course - the second in a series of three - discusses several  approaches to the first and third problems of clustering identified in  module I - viz., pre-clustering tendency assessment and post-clustering  cluster validation. The target audience comprises advanced undergraduate  and graduate students majoring in engineering and science, and  practicing engineers and scientists interested in either research about  or applications of clustering to real world problems such as data  mining, image analysis and bioinformatics. Some of subject matter in  this course is available in textbooks (most notably some of the material  about cluster validity functionals), and some of the subject matter is  the object of (my) current research. The references contain pointers to  some excellent papers on these topics, and on a number of related or  competitive methods that have been proposed and studied by others. I  begin with a simple numerical example that establishes the necessity for  both assessment and validity. Then, I discuss the visual assessment of  tendency family of algorithms (VAT, sVAT and coVAT). These algorithms  produce images that enable a user to make useful guesses about the  number of clusters to seek in relational data before proceeding with a  partitioning method for finding the clusters. Since object data can  always be converted to relational form by computing pair wise distances,  these methods are well defined for all types of unlabeled numerical  data. The coVAT algorithm provides a means for estimating the number of  clusters in each of the four problems associated with rectangular  relational data: row clusters, column clusters, joint (pure) clusters,  and mixed co-clusters. The second half of this course presents some  examples of cluster validation using scalar measures or indices of  cluster validity. Several examples from each of the three major  categories (crisp, fuzzy and probabilistic) of indices are presented.  This course concludes with a numerical example that c mpares 23 indices  of all three types on clusters in 12 sets of data drawn from mixtures of  Gaussian distributions having either 3 or 6 components. (SOME) indices  of all three types do pretty well in this example, while others do very  badly. I don't think this problem has a general "solution", but since we  use clustering in many, many applications, we keep trying to find good  indices to validate algorithmic outputs.

What you will learn:

  • Review scalar measures of Validity
  • Examine Visual Assessment of Tendency (VAT)
  • Discuss VAT for small, square data sets

Related courses:

Who should attend: Electrical engineer, Systems engineer, Hardware engineer, Design engineer, Product engineer,  Communication engineer

Instructor

James Bezdek

James Bezdek Photo

Jim received the BS in Civil Engineering from U. of Nevada and the PhD in  Applied Mathematics Cornell University. Jim is past president of  NAFIPS, IFSA and the IEEE NNC (aka CIS), is the founding editor of the  Int'l. Jo. Approximate Reasoning and the IEEE Transactions on Fuzzy  Systems, is a fellow of the IEEE and IFSA, and is a recipient of the  IEEE 3rd Millenium , IEEE Fuzzy Systems Pioneer and IEEE Frank  Rosenblatt medals

Publication Year: 2008

ISBN: 1-4244-1441-5


eLearning Archive: A Primer on Clustering: II. Tendency Assessment and Cluster Validity
  • Course Provider: Educational Activities
  • Course Number: EDP074
  • Duration (Hours): 1
  • Credits: 0.1 CEU/ 1 PDH