🗫 ML - Clustering
Introduction
Quoting from http://baoqiang.org/?p=579
k-Nearest-Neighbour and K-Means clustering
These two are arguably the two commonly used cluster methods. One of the reasons is that they are easy to use and also somehow straightforward. So how do they work?
k-Nearest-Neighbour:
Provide N n-dimension entries with known associated classes for each
entry, the number of classes is k, that is,
For a new entry
K-means: Given N n-dimension entries and classify them in k classes. At first, we randomly choose k entries and assign them to k clusters. They are the seed classes. Then we calculate the distance between each entry and each class. Each entry will be assigned into one class in terms of the its distance to each class, i.e., assign the entry to its closest class. After the assignment is complete, we then calculate the centroid of each class based on their new members. After the centroid calculation, we go back to the distance calculation and therefore new round classification. We stop the iteration when there is convergence,i.e,, no new centroid and classification.
The two methods are all semi-supervised learning algorithms because they do need we provide the number of clusters prior the clustering.
References
K-means Cluster Analysis. UC Business Analytics R Programming Guide https://uc-r.github.io/kmeans_clustering#optimal
Thean C Lim. Clustering: k-means, k-means ++ and gganimate. https://theanlim.rbind.io/post/clustering-k-means-k-means-and-gganimate/