Interaction in Linear Regression

This tutorial focuses on interaction between a categorial variable and a continuous variable in linear regression. Note that, in this tutorial, we limit the the categorical variable to be 2 levels. (For a categrocial variable with 3 levels, please refer to my another tutotrial on interaction and coding in linear regression .) Coding Note In … Read more

Dummy and Contrast Codings in Linear Regression

This tutorial explains the differences between dummy coding and contrast coding in linear regression using R code examples. It is worth pointing out that, this tutorial focuses on the categorical independent variable has 3 levels. Short Note Note that, in R, the default reference group in dummy coding uses the first item in an alphabetical … Read more

Changing Reference Level in Dummy Coding in R

You can change the reference level in dummy coding in R by using the following R code. contr.treatment(total_levels, base = Number_reference_level) Step 1: Prepare Data The following R code generates a sample data. X Y 1 1 -0.56047565 2 2 -0.23017749 3 3 1.55870831 4 1 0.07050839 5 2 0.12928774 6 3 1.71506499 7 1 … Read more

Dummy and Contrast Codings in R

 “Dummy” or “treatment” coding is to create dichotomous variables where each level of the categorical variable is contrasted to a specified reference level. Basic Syntax of Dummy and Contrast Coding 1. Dummy Coding The following is the syntax to do dummy coding in R. contr.treatment( number_of_level_of_X ) 2 3 1 0 0 2 1 0 3 … Read more

Quartile: Definition and Example

Definition of Quartile A quartile is a statistic describing how a set of data points are divided into 4 groups. Quartiles split a set of data by using 3 points: the lower quartile (Q1), the median (Q2), and the upper quartile (Q3). Together with the minimum and maximum values, 3 quartiles split the data set … Read more

Difference between Descriptive Statistics and Inferential Statistics

Descriptive statistics aim to summarize the characteristics of a given data set. In contrast, inferential statistics aim to use a sample of data to draw inferences about the whole population (i.e., hypothesis testing). Types of Descriptive Statistics 1. Measures of Central Tendency Central tendency is used to describe where the center of a dataset is located. Mean, … Read more

Difference between Sample and Population

A population is the entire group of individuals about whom you want to draw conclusions. In contrast, a sample is the subset of the same entire group. Example 1 of sample and population You would like to study if students like online courses at your university. Suppose your university has 10K students; thus, these 10K students … Read more

Population Variance Formula and Calculation by Hand

This tutorial shows the formula for population variance and the steps for calculating population variance by hand. Formula Population variance is the measure of the variability of a population. The following is the formula for population variance. where, Population vs. Sample Data The following is the population of a set of data. It has 11 … Read more

Sample Variance Formula and Calculation by Hand

Sample variance is the measure of the variability in a given sample. A sample is a set of observations that are a subset of a population. Sample Variance Formula The following is the formula for sample variance. where, Data Example The following is a sample of 6 students with math scores. Calculating sample variance by … Read more

How to Calculate Two-Factor ANOVA without Replication

ANOVA Two-Factor without Replication is used for a design of two factors (e.g., Factor A and Factor B) and only 1 observation in each cell. For instance, both Factor A and Factor B have two levels, leading to 4 cells in total. Each cell only has 1 observations (see below). Variance Partitioning Two-Factor ANOVA without … Read more