## Tech Talks: Naive Bayes & Bayesian Networks

Naive Bayes and Bayesian Networks apply Bayesian Decision Theory to data sets consisting of data points that are represented as strings of features that can be present or absent.

Within this framework, Naive Bayes, is the method that results from the assumption that the features are pairwise mutually stochastically independent. Bayesian Networks, on the other hand, result from making such an independence assumption only for some, but not for all, pairs of features.

## Tech Talks: Bayesian Decision Theory & Gaussians

Bayesian Decision Theory establishes how prior and posterior probabilities factor into decision making in the presence of uncertainty. It can therefore be thought of as one of the most essential and versatile tools of any data scientist.

Say you have an action that may or may not have some desired outcome. From a Bayesian perspective, the objective of decision-making should be to take an action if, and only if, the expected utility of taking the action, if the desired outcome occurs, is greater than the expected disutility of taking the action, if the desired outcome does not occur. Such an expected utility is captured by the utility of the outcome weighted by the probability of the outcome, given that the action has been taken. This probability is often unobservable, but can be derived, through application of Bayes’ rule, from observable ingredients.

## Tech Talks: Nearest Prototype Methods

By nearest prototype methods, we mean methods such as _k_-means and _k_-nearest neighbour. These are also sometimes referred to as instance-based methods.

These methods work by storing individual instances of data points from the sample, or statistical aggregates over individual instances, and then making predictions about new and previously unseen instances by comparing them to the ones in store.

## Tech Talks: Data Representation & Similarity Measures

Notions of similarity and dissimilarity, as well as closeness and distance are at the heart of the kinds of mathematical models that enable machine learning.

Distance, in a geometric sense, would seem to be a rather rigid concept. But to a mathematician there are, in fact, surprising degrees of freedom within the choice of a distance measure, giving different mathematical properties to the resulting geometric spaces.

## Tech Talks: Machine Learning Evaluation & Methodology

Evaluation Methodology greatly influences the outcome of any effort to find a solution to a machine learning problem, as it provides the theoretical framing of the problem at hand, and how to tell whether a given solution is better or worse than another.

Topics of interest include the correct framing of a problem as one of several types of machine learning problem, these types being supervised classification, semi-supervised classification, unsupervised clustering, structure mining and regression, the correct choice of an evaluation measure to quantify the quality of a machine learner’s output, as well as methodological best practice for avoiding effects like data snooping bias.

## Tech Talks: Decision Trees & Issues in Representation

Decision trees are a highly generic concept class which can be used to fit a concept to almost any kind of data that might be thrown at a machine learner.

But this representational power comes at a price. Selecting a concept from a higher-dimensional concept class, i.e. from a more generic, more representationally powerful, concept class means that more information is required, and the output, even though it may provide a good fit to the data, may not describe the data in a meaningful way, nor generalize particularly well.

This video lecture introduces decision trees in detail, and discusses some of the tradeoffs around using a highly generic concept class like a decision tree.

## Tech Talks: Data Representation & Statistics

The data samples forming the point of departure for any machine learning problem come about through some kind of a natural or computational process, which leave recognizable patterns in the data.

For example, when picking out two observable numeric features about the data points in the sample, and when representing them in a scatter plot, then it may be the case that those features have come about through a process which is additive in nature, resulting in the well-known visual pattern of a normal distribution, where data points are densely clustered around some central region, and become more sparse, as one moves away from the centre. On the other hand, the process may be multiplicative in nature, in which case the pattern will show visible distortions, resulting in a log-normal distribution.

## Tech Talks: Data Representation & Information Theory

Machine learning is about fitting a model to a data sample. But how do you measure the amount of information that’s contained within a sample? How do you measure the amount of information required to form a concept of that data? Information theory has some very good answers to questions of how to measure information.

In this video lecture, I introduce the basics of information theory and develop the basic intuitions behind information theory using an example from codebreaking.

The immediate approach to measuring the amount of information contained in some digital object would be to just count the number of bytes required to store it. But how do ideas like compression factor into that? Does a file that consists of a thousand zeros really contain the same amount of information as a file that consists of a thousand random numbers? Information theory extends the idea of counting bytes into a statistical realm to provide some answers to those questions.