Changing Reference Level in Dummy Coding in R

You can change the reference level in dummy coding in R by using the following R code.

contr.treatment(total_levels, base = Number_reference_level)

Step 1: Prepare Data

The following R code generates a sample data.

# set seed
set.seed(123)

# Repeat a sequence of numbers:
X<-rep(c(1, 2, 3), times=5)
X<-as.factor(X)
Y<-rnorm(15)

# combine it into a data frame
df<-data.frame(X,Y)
print(df)
   X           Y
1  1 -0.56047565
2  2 -0.23017749
3  3  1.55870831
4  1  0.07050839
5  2  0.12928774
6  3  1.71506499
7  1  0.46091621
8  2 -1.26506123
9  3 -0.68685285
10 1 -0.44566197
11 2  1.22408180
12 3  0.35981383
13 1  0.40077145
14 2  0.11068272
15 3 -0.55584113

Step 2: Check Default Reference Level

For the data above, we can calculate means for all the 3 levels. The right-most column in the table below shows how the default dummy coding works.

XMeansDummy Coding by default uses group 1 as the Reference (i.e., intercept)
Group 1-0.0148Intercept-0.0148
Group 2-0.0062Coded Variable 1-0.006-(-0.0148) =0.0086
Group 30.4782Coded Variable 20.4782-(-0.0148) =0.4930

We can use R code contr.treatment to do the dummy coding in linear regression as follows.

# dummy coding
contrasts(df$X) =contr.treatment(3)

# linear regression with dummy coding
result<-lm(Y~X,data=df)

# summarize the result
summary(result)
Call:
lm(formula = Y ~ X, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.2588 -0.4883  0.0853  0.4456  1.2369 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.014788   0.391751  -0.038    0.971
X2           0.008551   0.554020   0.015    0.988
X3           0.492967   0.554020   0.890    0.391

Residual standard error: 0.876 on 12 degrees of freedom
Multiple R-squared:  0.07959,	Adjusted R-squared:  -0.07381 
F-statistic: 0.5188 on 2 and 12 DF,  p-value: 0.608

As we can see from the output above, the default reference level is level 1 (or, group 1) given that the intercept is the mean of group 1.

Step 3: Change default reference level

We can change the default reference level to level 3 (or, group 3) using the following R code. The following first prints out how it changes.

# dummy coding by defult, using group 1 as the reference
contr.treatment(3)

# changing the reference group by adding base=No.
contr.treatment(3, base = 3)
# dummy coding by defult, using group 1 as the reference
> contr.treatment(3)
  2 3
1 0 0
2 1 0
3 0 1

# changing the reference group by adding base=No.
> contr.treatment(3, base = 3)
  1 2
1 1 0
2 0 1
3 0 0
XMeansDefault Dummy Coding
(Reference: Group 1)
Changed Dummy Coding
(Reference: Group 3)
Group 1-0.0148Intercept-0.01480.4782
Group 2-0.0062Coded Variable 1-0.006-(-0.0148) =0.0086-0.0148-0.4782=-0.493
Group 30.4782Coded Variable 20.4782-(-0.0148) =0.4930-0.0062-0.4782=-0.4844

The following is the R code to do the actual change.

# dummy coding, with default reference level to to group 3 (level 3)
contrasts(df$X) =contr.treatment(3, base = 3)

# linear regression with dummy coding
result<-lm(Y~X,data=df)

# summarize the result
summary(result)
Call:
lm(formula = Y ~ X, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.2588 -0.4883  0.0853  0.4456  1.2369 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.4782     0.3918   1.221    0.246
X1           -0.4930     0.5540  -0.890    0.391
X2           -0.4844     0.5540  -0.874    0.399

Residual standard error: 0.876 on 12 degrees of freedom
Multiple R-squared:  0.07959,	Adjusted R-squared:  -0.07381 
F-statistic: 0.5188 on 2 and 12 DF,  p-value: 0.608

Reference

Dummy and Contrast Codings in R

Leave a Comment