Given an approach to a machine learning problem, how do we know it’s correct? In what sense does it need to be correct? A theoretician’s answer: It needs to be PAC correct, i.e. “probably approximately” correct.
Machine learning starts with data and ends with the machine having formed a concept of the data or, more informally speaking, having understood the data, so that the machine can make decisions based on that concept. Generally speaking, it would be too much to ask, to require that a machine learner must never be wrong about such a decision. If the concept is approximately correct, so that, say, 80% of all decisions are right, that’s good enough. Also, it would be too much to ask, to require that a machine learner must be able to form a concept of the data that is approximately correct, regardless of the data. So, if the machine learner is probably correct, in the sense that on, say, 80% of all inputs of data, it will form a concept that’s approximately correct, then that’s good enough.
In this video lecture, I describe the theoretical framing of machine learning problems in general, as well as the theoretical underpinnings of formally rigorous correctness claims pertaining to any kind of a machine learning solution, following from these basic ideas.
Throughout this video lecture, we use as a running example, the concept of a rectangle. We are given sample data consisting of points in the two-dimensional plane, these points having come out of a process which colours all points falling within some rectangle in red, and all points falling outside the rectangle in blue. The goal is to figure out the rectangle, so that, given a new, and previously unseen, point in the plane, we can predict its colour.
This may seem like an overly simple example, but it is the same concepts underlying a solution to such a simple problem, which enable even highly sophisticated technologies such as the self-learning media monitoring solution by PANOPTICOM.
The red points could be relevant tweets on Twitter, relevant articles on an RSS feed, relevant HTML web pages, etc., whereas the blue points could be irrelevant ones. In the case of a point in the two dimensional geometric plane, we can observe certain features such as their position along the x-axis, and their position along the y-axis. In media monitoring, we have different kinds of features. In particular, what we can observe is the presence or absence, or prominence, of certain keywords.
In the setting of a self-learning media monitoring solution, such as the one by PANOPTICOM, we can then observe the “colour” of the points for a sample, by having a human lector or encoder input relevance decisions, which the computer can then use to form a concept of what’s relevant and what isn’t.
In the video lecture, the rectangle learning game also serves another purpose, besides introducing powerful machine learning concepts in a simple example: The reason I particularly like this example is because it helps demystify the inner workings of machine learning.
When popular science communicates the ideas of machine learning to a broader audience, it is often analogies with biological systems and with human learning that take centre stage. For example, with neural networks, the idea is to build a model human brain inside the computer. This makes for interesting reading, but quickly leads to overgeneralizations, when the naive reader takes the idea too seriously, that the computer now possesses a human’s ability to learn.
In order to solve an engineering problem, the fruitful approach is often to free oneself of notions preconceived by nature’s solutions to engineering problems. For example, planes do not fly by flapping their wings, and birds do not have jet engines, and there is an entire science of aerodynamics which needs to be arrived at by systematic observation and experimentation which is by no means limited to the observation of birds.
The goal of this lecture series is to arrive at an understanding of machine learning by approaching it from first principles. There is nothing wrong with a machine learning solution based on the concept of “rectangle”, if “rectangle” is the right way to describe the data, even though it’s devoid of all the magical learning mojo of the biological neuron, and even though it would seem somewhat silly to describe a rectangle-fitting process as “artificial intelligence”.PDF download