Understanding Interaction Effects in Data Analysis

This tutorial introduces the basic idea of interaction effects in data analysis. This tutorial includes what an interaction effect is, example of an interaction effect, and the statistical methods to do the analysis.

1. What are interaction effects? (The definition)

An interaction effect is when the effect of one variable (e.g., X) on another variable (e.g., Y) is dependent on a third variable (e.g., Y). The following is the visual illustration.

Y=β01X+β2M+β3X×M

Interaction effect of X and M on Y

2. Example of Interaction Effects

Suppose that you would like to how Brand A and Brand B are different in Sales. Thus, Brand (Brand A and Brand B) is the X, and Sales are the Y.

You calculate and find that Brand A has $45M sales and Brand B has $101M sales. Thus, you see the difference in sales. (Such difference can be called an effect.) Thus, the starting model looks like as follows.

Sales=β01Brand

However, you realize that there is another variable (i.e., Region) that you need to consider such as West Coast and East Coast. Thus, the basic model can be expanded as follows.

Sales=β01Brand+β2Region+β3Brand×Region

In particular, the difference between Brand A and Brand B occurs mainly on East Coast (25M vs. 80M). In contrast, the sales numbers on West Coast are roughly the same (20M vs. 21M).

Thus, you can see the importance of considering the third variable M, as it provides further insights into the basic effect of X on Y.

East CoastWest Coast
Brand Asales = 25Msales = 20MBrand A sales = 25+20=45M
Brand Bsales = 80Msales = 21MBrand B sales = 80+21=101M
Example 1 of Interaction Effect
Example 1 of Interaction Effect

3. Statistical Methods to Analyze Interaction Effects

Depending on the different data types of X, M, and Y, you can have different ways to conduct the analysis to estimate β0, β1, β2, and β3.

Y=β01X+β2M+β3X×M

The following table summarizes different statistical methods to estimate those coefficients.

YXMStatistical Methods
ContinuousContinuousContinuousLinear Regression
ContinuousContinuousCategoricalLinear Regression
ContinuousCategorical ContinuousLinear Regression
ContinuousCategorical Categorical ANOVA or Linear Regression
CategoricalContinuous or Categorical Continuous or Categorical Logistic regression

4. Interaction of Two Categorical Independent Variables in SPSS

Here, we are going to use a simulated dataset of two categorical independent variables to demonstrate how to conduct interaction analysis in SPSS.

We can see that the interaction effect of Brand×Region has the p-value of < .001. This is smaller than 0.05. Thus, we reject the null hypothesis and conclude that the effect of Brand on Sales is moderated by Region.


Further Reading

I have provided tutorials to conduct such analysis in R, Python, and SPSS. The following shows a few examples.