# Changing Reference Level in Dummy Coding in R

You can change the reference level in dummy coding in R by using the following R code.

contr.treatment(total_levels, base = Number_reference_level)

## Step 1: Prepare Data

The following R code generates a sample data.

``````# set seed
set.seed(123)

# Repeat a sequence of numbers:
X<-rep(c(1, 2, 3), times=5)
X<-as.factor(X)
Y<-rnorm(15)

# combine it into a data frame
df<-data.frame(X,Y)
print(df)``````
```   X           Y
1  1 -0.56047565
2  2 -0.23017749
3  3  1.55870831
4  1  0.07050839
5  2  0.12928774
6  3  1.71506499
7  1  0.46091621
8  2 -1.26506123
9  3 -0.68685285
10 1 -0.44566197
11 2  1.22408180
12 3  0.35981383
13 1  0.40077145
14 2  0.11068272
15 3 -0.55584113```

## Step 2: Check Default Reference Level

For the data above, we can calculate means for all the 3 levels. The right-most column in the table below shows how the default dummy coding works.

We can use R code contr.treatment to do the dummy coding in linear regression as follows.

``````# dummy coding
contrasts(df\$X) =contr.treatment(3)

# linear regression with dummy coding
result<-lm(Y~X,data=df)

# summarize the result
summary(result)``````
```Call:
lm(formula = Y ~ X, data = df)

Residuals:
Min      1Q  Median      3Q     Max
-1.2588 -0.4883  0.0853  0.4456  1.2369

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.014788   0.391751  -0.038    0.971
X2           0.008551   0.554020   0.015    0.988
X3           0.492967   0.554020   0.890    0.391

Residual standard error: 0.876 on 12 degrees of freedom
Multiple R-squared:  0.07959,	Adjusted R-squared:  -0.07381
F-statistic: 0.5188 on 2 and 12 DF,  p-value: 0.608```

As we can see from the output above, the default reference level is level 1 (or, group 1) given that the intercept is the mean of group 1.

## Step 3: Change default reference level

We can change the default reference level to level 3 (or, group 3) using the following R code. The following first prints out how it changes.

``````# dummy coding by defult, using group 1 as the reference
contr.treatment(3)

# changing the reference group by adding base=No.
contr.treatment(3, base = 3)``````
```# dummy coding by defult, using group 1 as the reference
> contr.treatment(3)
2 3
1 0 0
2 1 0
3 0 1

# changing the reference group by adding base=No.
> contr.treatment(3, base = 3)
1 2
1 1 0
2 0 1
3 0 0```

The following is the R code to do the actual change.

``````# dummy coding, with default reference level to to group 3 (level 3)
contrasts(df\$X) =contr.treatment(3, base = 3)

# linear regression with dummy coding
result<-lm(Y~X,data=df)

# summarize the result
summary(result)``````
```Call:
lm(formula = Y ~ X, data = df)

Residuals:
Min      1Q  Median      3Q     Max
-1.2588 -0.4883  0.0853  0.4456  1.2369

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.4782     0.3918   1.221    0.246
X1           -0.4930     0.5540  -0.890    0.391
X2           -0.4844     0.5540  -0.874    0.399

Residual standard error: 0.876 on 12 degrees of freedom
Multiple R-squared:  0.07959,	Adjusted R-squared:  -0.07381
F-statistic: 0.5188 on 2 and 12 DF,  p-value: 0.608```

## Reference

