Machine learning is about fitting a model to a data sample. But how do you measure the amount of information that’s contained within a sample? How do you measure the amount of information required to form a concept of that data? Information theory has some very good answers to questions of how to measure information.
In this video lecture, I introduce the basics of information theory and develop the basic intuitions behind information theory using an example from codebreaking.
The immediate approach to measuring the amount of information contained in some digital object would be to just count the number of bytes required to store it. But how do ideas like compression factor into that? Does a file that consists of a thousand zeros really contain the same amount of information as a file that consists of a thousand random numbers? Information theory extends the idea of counting bytes into a statistical realm to provide some answers to those questions.
The theoretical framing of a codebreaking problems like the one considered in this video lecture closely resembles machine learning problems like the problem of distinguishing relevant web content from irrelevant web content in a media monitoring system such as the one by PANOPTICOM.
The problem in media monitoring is that a user’s relevance profile is implied by their relevance decisions, but not directly observable. So a nontrivial statistical induction problem needs to be solved in order to induce the criteria that distinguish relevant items from irrelevant ones.
This is the same situation you would find yourself in, when you’re a cryptanalyst. You have some data, which is an encrypted message you may have intercepted. You know that, once you’ve intercepted enough data, the data should theoretically imply the cryptographic key that was used for encrypting the data, but getting to the key from the data may still be a difficult problem to solve.
There is a second concept which appears on the scene for the first time in this lecture, and that is the concept of data representation. This is because information cannot be measured purely in the abstract. Rather, there are certain engineering choices that inform how a piece of information is represented, for example in a data file on a computer, and we will see throughout the rest of this lecture series that engineering choices about data representation are of crucial importance in solving machine learning problems.PDF download