> Tech Talks: Nearest Prototype Methods

By nearest prototype methods, we mean methods such as _k_-means and _k_-nearest neighbour. These are also sometimes referred to as instance-based methods.

These methods work by storing individual instances of data points from the sample, or statistical aggregates over individual instances, and then making predictions about new and previously unseen instances by comparing them to the ones in store.

In this video lecture, we present the two basic ideas behind these methods. The first idea is the idea of abstracting over clusters of individual instances statistically, for example by using a measure of central tendency, leading to methods such as _k_-means, _k_-medians, _k_-medoids, etc. A new instance can then be classified by assigning the class label of the cluster of points which has its mean, its median, its medoid, etc. in the location closest to the location of the new instance.

The second idea is the idea of smoothing over outliers by taking a majority vote among the _k_-neighbours that are nearest, or the _k_-clusters which have their prototype nearest.

The machine learning method used at PANOPTICOM as part of our media monitoring solution uses similar ideas, although it cannot be described as a straightforward application of _k_-means or the like.

download  PDF download

(Reproduced here, courtesy of PANOPTICOM).