Problem: given observations for
group
the observations into
populations.
Parallel to discriminant analysis but: no training data.
Here: just show some example analyses:
Cluster iris data: so presumably.
Often: not known.
Many possible SPlus functions including: agnes, clara, pam, hclust
Example: Cluster the iris data.
Put all 150 observations into 1504 matrix. (Remove species
labels.)
Cluster into 2, 3 ,4 groups using pam:
pamiris2 <- pam(x,2) pamiris3 <- pam(x,3) pamiris4 <- pam(x,4)Output for two clusters:
> pam(x,2) Call: pam(x = x, k = 2) Medoids: Sepal L. Sepal W. Petal L. Petal W. [1,] 5.0 3.4 1.5 0.2 [2,] 6.2 2.8 4.8 1.8 Clustering vector: [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 [112] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [149] 2 2Notice that the algorithm correctly groups together the first 50 observations. The other two species are then lumped together.
> pam(x,3) Call: pam(x = x, k = 3) Medoids: Sepal L. Sepal W. Petal L. Petal W. [1,] 5.0 3.4 1.5 0.2 [2,] 6.0 2.9 4.5 1.5 [3,] 6.8 3.0 5.5 2.1 Clustering vector: [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [75] 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 3 3 3 3 2 3 3 3 3 [112] 3 3 2 2 3 3 3 3 2 3 2 3 2 3 3 2 2 3 3 3 3 3 2 3 3 3 3 2 3 3 3 2 3 3 3 2 3 [149] 3 2Notice difficulty with 2 versus 3. Total of two from group 2 clustered into group 3; total of 14 from group 3 clustered into group 2.
Now a method which does not require specification of number of classes but doesn't estimate number of classes either. Hierarchical clustering.
> agnesiris <- agnes(x) > cutree(agnesiris,k=2) [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [38] 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [149] 1 1 attr(, "height"): [1] 0.8964852 0.2645751 > plot(cutree(agnesiris,k=2)) > plot(cutree(agnesiris,k=3)) > plot(cutree(agnesiris,k=4))