The k-means method was used to perform the analysis. The algorithm gathers the cluster points in such a way that the cumulative distance between the points and the cluster midpoint, where they are located, is minimal, but that the distance between clusters is a maximum. The square of the Euclidean distance was used as a measure of distance. The choice check details of the number of clusters is a tricky problem. The most convenient situation is when there are environmental pointers to the number of features investigated, as this will then be equal to the number of clusters formed. If such information
is unavailable, one can employ automated methods. Of 30 methods of cluster number choice analysed by Milligan & Cooper (1985), the method of Caliński & Harabasz (1974) was identified as one of the most reliable for determining the maximum of the Caliński-Harabasz index CHindex. It was defined as equation(20) CHindex=BK−1×N−KW, where N – number of all points, K – number of clusters, B – Alectinib in vivo distance between clusters and W – the distance within clusters. The magnitudes of B and W are obtained as follows: equation(21) B=∑k=1Knk||zk−z||2,W=∑k=1K∑i=1nk||xi∈k−zk||2,
where nk – number of points in cluster k, zk – position of the centre of cluster k, z – position of the centre of all points, xi∈ k – the i-th point located in cluster k, and || || is the distance norm ( Maulik & Bandyopadhyay 2002). Ray & Turi (1999) derived another method of determining cluster numbers. Their index makes direct use of the BCKDHB cluster assumption choice and is defined as follows: equation(22) IIindex=intraintra=N−1∑k=1K∑i=1nk||xi∈k−zk||2min||zi−zj||2,
where ‘intra’ is the mean distance between the points and the centre of the cluster containing them, while ‘inter’ is the minimum distance between the clusters. In these cases the number of clusters involves finding the maximum of CHindex or minimum of IIindex. Both indices were determined for numbers of clusters from 2 to 20 in all the cases analysed (Figure 9). In general CHindex decreases and IIindex increases with increasing numbers of clusters. Despite the many deviations from the above trend for both indices it was difficult to define the cluster number. A small number of clusters was found to be the most appropriate. To identify the maximum number of clusters, the total distance between the points and each cluster centre (where they are located) was defined: equation(23) WK=∑k=1K∑i=1nk‖xi∈k−zk‖2. By analysing the WK – WK − 1 dependence ( Figure 9), on the assumption that the value must not be too high, 6 was chosen as the most appropriate value. Cluster analysis was performed for two to six clusters for deviation types MV, LT, ST separately and for all the types. In order to assign a specific cluster to a seabed morphological type, the results for the example profile were analysed first.