Richard Bergmair's Media Library

ML Lecture #5: Machine Learning Evaluation & Methodology

Evaluation Methodology greatly influences the outcome of any effort to find a solution to a machine learning problem, as it provides the theoretical framing of the problem at hand, and how to tell whether a given solution is better or worse than another.

Topics of interest include the correct framing of a problem as one of several types of machine learning problem, these types being supervised classification, semi-supervised classification, unsupervised clustering, structure mining and regression, the correct choice of an evaluation measure to quantify the quality of a machine learner’s output, as well as methodological best practice for avoiding effects like data snooping bias.

A supervised classification task is a machine learning task in which the machine is given a sample consisting of data points, as characterized by certain observable features, as well as class labels, where the machine is then called upon to predict class labels for previously unseen data points, based on their observable features.

An unsupervised clustering task is a machine learning task, where the machine is given a sample consisting only of the data points, once again, as characterized by their observable features, but where there are no class labels. In this case, the machine is called upon to use structural properties of the feature space and the distribution of data points within that space to group together points within so-called clusters, if they are close to other points within the same cluster, but far away from points in other clusters.

The self-learning media monitoring solution by PANOPTICOM is an example of a third type of machine learning task, called a semi-supervised task, which is a hybrid between the supervised and the semi-supervised type. The data points considered in media monitoring are blog articles and the like. The machine is given relevance judgments, understood to be the class labels of interest, but it is given such class labels only for some, but not for all of the data points in the sample. In this case, it is the class labels on the sample that define the classes we are trying to get to, so the machine learns the distinction between what’s relevant and what isn’t based on the known relevance judgments. But we also use structural properties of the feature space as well as knowledge of the distribution of data points within that space, to make use of additional data points in the sample which we do not have class labels for. So the machine can develop an even more refined method for distinguishing what’s relevant from what isn’t, based on its knowledge of, say, blog articles, on the basis of, for example, the assumption that blog articles from the same source, or blog articles that involve the same keywords, will behave similarly to each other, in terms of their likelihood of being relevant.

The next issue in evaluation will be the choice of a proper evaluation measure, which a number of considerations enter into. Staying within the example of media monitoring, one such consideration could be: Are we primarily trying to minimize errors leading to irrelevant items being misclassified as relevant, or errors leading to relevant items being misclassified as irrelevant? Are there certain weightings, i.e. do some items carry more weight than others?

Last, but not least, the development cycle surrounding a the optimization of a machine learning method must follow certain best practices in order for the developer to avoid fooling themselves, and others, through data snooping bias. Data snooping bias is subtle, but very interesting and important concept, which we cover in some detail in the video lecture.