> Tech Talks: Naive Bayes & Bayesian Networks

Naive Bayes and Bayesian Networks apply Bayesian Decision Theory to data sets consisting of data points that are represented as strings of features that can be present or absent.

Within this framework, Naive Bayes, is the method that results from the assumption that the features are pairwise mutually stochastically independent. Bayesian Networks, on the other hand, result from making such an independence assumption only for some, but not for all, pairs of features.

>> read more

> Tech Talks: Bayesian Decision Theory & Gaussians

Bayesian Decision Theory establishes how prior and posterior probabilities factor into decision making in the presence of uncertainty. It can therefore be thought of as one of the most essential and versatile tools of any data scientist.

Say you have an action that may or may not have some desired outcome. From a Bayesian perspective, the objective of decision-making should be to take an action if, and only if, the expected utility of taking the action, if the desired outcome occurs, is greater than the expected disutility of taking the action, if the desired outcome does not occur. Such an expected utility is captured by the utility of the outcome weighted by the probability of the outcome, given that the action has been taken. This probability is often unobservable, but can be derived, through application of Bayes’ rule, from observable ingredients.

>> read more

> Tech Talks: Nearest Prototype Methods

By nearest prototype methods, we mean methods such as _k_-means and _k_-nearest neighbour. These are also sometimes referred to as instance-based methods.

These methods work by storing individual instances of data points from the sample, or statistical aggregates over individual instances, and then making predictions about new and previously unseen instances by comparing them to the ones in store.

>> read more

> Tech Talks: Data Representation & Similarity Measures

Notions of similarity and dissimilarity, as well as closeness and distance are at the heart of the kinds of mathematical models that enable machine learning.

Distance, in a geometric sense, would seem to be a rather rigid concept. But to a mathematician there are, in fact, surprising degrees of freedom within the choice of a distance measure, giving different mathematical properties to the resulting geometric spaces.

>> read more

> Tech Talks: Machine Learning Evaluation & Methodology

Evaluation Methodology greatly influences the outcome of any effort to find a solution to a machine learning problem, as it provides the theoretical framing of the problem at hand, and how to tell whether a given solution is better or worse than another.

Topics of interest include the correct framing of a problem as one of several types of machine learning problem, these types being supervised classification, semi-supervised classification, unsupervised clustering, structure mining and regression, the correct choice of an evaluation measure to quantify the quality of a machine learner’s output, as well as methodological best practice for avoiding effects like data snooping bias.

>> read more

> Tech Talks: Decision Trees & Issues in Representation

Decision trees are a highly generic concept class which can be used to fit a concept to almost any kind of data that might be thrown at a machine learner.

But this representational power comes at a price. Selecting a concept from a higher-dimensional concept class, i.e. from a more generic, more representationally powerful, concept class means that more information is required, and the output, even though it may provide a good fit to the data, may not describe the data in a meaningful way, nor generalize particularly well.

This video lecture introduces decision trees in detail, and discusses some of the tradeoffs around using a highly generic concept class like a decision tree.

>> read more

> Tech Talks: Data Representation & Statistics

The data samples forming the point of departure for any machine learning problem come about through some kind of a natural or computational process, which leave recognizable patterns in the data.

For example, when picking out two observable numeric features about the data points in the sample, and when representing them in a scatter plot, then it may be the case that those features have come about through a process which is additive in nature, resulting in the well-known visual pattern of a normal distribution, where data points are densely clustered around some central region, and become more sparse, as one moves away from the centre. On the other hand, the process may be multiplicative in nature, in which case the pattern will show visible distortions, resulting in a log-normal distribution.

>> read more

> Tech Talks: Data Representation & Information Theory

Machine learning is about fitting a model to a data sample. But how do you measure the amount of information that’s contained within a sample? How do you measure the amount of information required to form a concept of that data? Information theory has some very good answers to questions of how to measure information.

In this video lecture, I introduce the basics of information theory and develop the basic intuitions behind information theory using an example from codebreaking.

The immediate approach to measuring the amount of information contained in some digital object would be to just count the number of bytes required to store it. But how do ideas like compression factor into that? Does a file that consists of a thousand zeros really contain the same amount of information as a file that consists of a thousand random numbers? Information theory extends the idea of counting bytes into a statistical realm to provide some answers to those questions.

>> read more

> Tech Talks: The PAC Model

Given an approach to a machine learning problem, how do we know it’s correct? In what sense does it need to be correct? A theoretician’s answer: It needs to be PAC correct, i.e. “probably approximately” correct.

Machine learning starts with data and ends with the machine having formed a concept of the data or, more informally speaking, having understood the data, so that the machine can make decisions based on that concept. Generally speaking, it would be too much to ask, to require that a machine learner must never be wrong about such a decision. If the concept is approximately correct, so that, say, 80% of all decisions are right, that’s good enough. Also, it would be too much to ask, to require that a machine learner must be able to form a concept of the data that is approximately correct, regardless of the data. So, if the machine learner is probably correct, in the sense that on, say, 80% of all inputs of data, it will form a concept that’s approximately correct, then that’s good enough.

In this video lecture, I describe the theoretical framing of machine learning problems in general, as well as the theoretical underpinnings of formally rigorous correctness claims pertaining to any kind of a machine learning solution, following from these basic ideas.

>> read more