Generalized Linear Models (GLM) refers to a large class of models which include the familiar ordinary linear regression — ordinary least squares (OLS) regression — and the analysis of variance (ANOVA) models.

# A bag loaded with tricks (models, rather)

Both OLS regression and ANOVA deal with continuous response variables. However, there are times that we need to predict a categorical response variable, for example, yes/no responses and count data.

For this purpose, other models like logit, log-linear, and probit model, just to name some, will be appropriate.

Yes, there’s a lot of models in the GLM domain. It is easy to get confused on what type of statistical model is suitable for the data you have on hand.

Here, we will dispel this confusion by getting to know GLMs a bit more.

# What makes up a GLM?

There are three ingredients that make up a GLM:

1. Random component
2. Linear predictor

By knowing these three components, we can be guided on what type of model we should be using for our data.

Let’s look at them one by one.

# Component 1: Random component

The random component pertains to the response variable we are trying to model. Let’s call this variable Y.

This variable Y is assumed to follow a particular probability distribution. Here are some examples.

 Response Variable (Y) (Usually assumed) Distribution Number of successes in a given number of trials Binomial Counts Poisson, Negative Binomial Continuous observation (e.g., weight) Normal, Gamma

Categorical data have a nominal or ordinal scale of measurement. Interval and ratio data are both continuous.

If you are having trouble recognizing whether a variable is categorical or continuous, this explainer on the levels of measurement might help.

# Component 2: Linear predictor

The linear predictor in a GLM will specify the explanatory variables, also known as predictors. It follows the form:

α + β1x1 + β2x2 + … + βpxp

The x’s in the equation are the values of the predictors that you have specified.

For example, you might be interested in predicting the tendency of a person to vote or not to vote for a presidential candidate.

The response variable is then a yes/no variable, depending on whether a person will vote or not.

What can be potential predators? It could be their party of choice, their economic status, their level of education, just to name a few.

Note that the equation above is a linear equation. This is the “linear” in “generalized linear models”. It pertains to how the predictors enter the model in a linear fashion.

Now note that whenever we predict the response variable, we are predicting its mean or average value.

The link function is simply some function involving the mean response. Let’s denote this as ?.

Here are some common functions of ? used as link functions along with their names:

To complete the GLM, we equate the link function to the linear predictor.

For instance, a GLM using the log link with two predictor variables will look like this:

log(?) = α + β1x1 + β2x2

The GLM above is an example of a log-linear model.

# Which is which?

To wrap things up, here is a quick summary of what model you should use depending on the nature of the three components we have discussed:

 Random Component Predictors Link Function Model to use Normal Continuous Identity Linear Regression Normal Categorical Identity ANOVA Normal Mixed Identity Analysis of Covariance (ANCOVA) Binomial Mixed Logit Logistic Poisson Mixed Log Log-linear

Of course, this is only a selected few in the large selection of models that belong to the class of GLMs. They are among the typically used in practice, which is why they were chosen to be shown here.

Welcome to the world of GLMs. This primer is just the beginning — there is a long way ahead towards mastery. You are off to a good start.

### Suggested Posts ##### Lending DocAI Fast Tracks The Home Loan Process ##### A Framework For Developing A National Artificial Intelligence Strategy ##### Google Cloud Next 2019 | Simplify Hybrid Deployments With VMware NSX Service Mesh & Google Cloud Services ##### Tech’s Ever-growing Deepfake Problem ##### How To Install scikit-learn 