Generalized Linear Models (GLM) refers to a large class of models which include the familiar ordinary linear regression — ordinary least squares (OLS) regression — and the analysis of variance (ANOVA) models.

A bag loaded with tricks (models, rather)

Both OLS regression and ANOVA deal with continuous response variables. However, there are times that we need to predict a categorical response variable, for example, yes/no responses and count data.

Partner with liwaiwai.com
for your next big idea.
Let us know here.

From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.

CYBERPOGO.COM :: For the Arts, Sciences, and Technology.

DADAHACKS.COM :: Parenting For The Rest Of Us.

ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.

TAKUMAKU.COM :: For The Hearth And Home.

ASTER.CLOUD :: From The Cloud And Beyond.

LIWAIWAI.COM :: Intelligence, Inside and Outside.

GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.

FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.

ASTERCASTER.COM :: Supra Astra. Beyond The Stars.

BARTDAY.COM :: Prosperity For Everyone.

For this purpose, other models like logit, log-linear, and probit model, just to name some, will be appropriate.

Yes, there’s a lot of models in the GLM domain. It is easy to get confused on what type of statistical model is suitable for the data you have on hand.

Here, we will dispel this confusion by getting to know GLMs a bit more.

What makes up a GLM?

There are three ingredients that make up a GLM:

Random component
Linear predictor
Link function

By knowing these three components, we can be guided on what type of model we should be using for our data.

Let’s look at them one by one.

Component 1: Random component

The random component pertains to the response variable we are trying to model. Let’s call this variable Y.

This variable Y is assumed to follow a particular probability distribution. Here are some examples.

Response Variable (Y)	(Usually assumed) Distribution
Number of successes in a given number of trials	Binomial
Counts	Poisson, Negative Binomial
Continuous observation (e.g., weight)	Normal, Gamma

Categorical data have a nominal or ordinal scale of measurement. Interval and ratio data are both continuous.

If you are having trouble recognizing whether a variable is categorical or continuous, this explainer on the levels of measurement might help.

Component 2: Linear predictor

The linear predictor in a GLM will specify the explanatory variables, also known as predictors. It follows the form:

α + β₁x₁ + β₂x₂ + … + β_px_p

The x’s in the equation are the values of the predictors that you have specified.

For example, you might be interested in predicting the tendency of a person to vote or not to vote for a presidential candidate.

The response variable is then a yes/no variable, depending on whether a person will vote or not.

What can be potential predators? It could be their party of choice, their economic status, their level of education, just to name a few.

Note that the equation above is a linear equation. This is the “linear” in “generalized linear models”. It pertains to how the predictors enter the model in a linear fashion.

Component 3: Link function

Now note that whenever we predict the response variable, we are predicting its mean or average value.

The link function is simply some function involving the mean response. Let’s denote this as ?.

Here are some common functions of ? used as link functions along with their names:

Function	Link Type
?	Identity link
log(?)	Log link
log[?/(1-?)] , also known as logit (?)	Logistic or logit link

To complete the GLM, we equate the link function to the linear predictor.

For instance, a GLM using the log link with two predictor variables will look like this:

log(?) = α + β₁x₁ + β₂x₂

The GLM above is an example of a log-linear model.

Which is which?

To wrap things up, here is a quick summary of what model you should use depending on the nature of the three components we have discussed:

Random Component	Predictors	Link Function	Model to use
Normal	Continuous	Identity	Linear Regression
Normal	Categorical	Identity	ANOVA
Normal	Mixed	Identity	Analysis of Covariance (ANCOVA)
Binomial	Mixed	Logit	Logistic
Poisson	Mixed	Log	Log-linear

Of course, this is only a selected few in the large selection of models that belong to the class of GLMs. They are among the typically used in practice, which is why they were chosen to be shown here.

Welcome to the world of GLMs. This primer is just the beginning — there is a long way ahead towards mastery. You are off to a good start.