What is Linear Regression Model? Definition and Example

1. Definition of Linear Regression Model

Multiple linear regression is a linear model accessing the relationship between a dependent variable (DV, or Y) and multiple intendent variables (IV, or X).

For instance, you might want to test how consumer purchase intention can be impacted by price as well as by household income. In this case, consumer purchase intention is the DV or Y, whereas price and household income are the IVs or Xs. Conceptually, you can think of it as follows.

2. Math Statements of Linear Regression Model

Below is the regression function, in which ?₀ is the intercept and ?₁ and ?2 are the regression coefficients and ? is the random error.

\[y=\beta_0 +\beta_1x_1+\beta_2x_2+\epsilon\]

You might have questions regarding what criteria to find ?₀, ?₁, and b2. It is based on the goal of minimizing the residual — same logic as simple linear regression

\[f(x)=b_0 +b_1x_1+b_2x_2\]

3. Matrix Solution for Linear Regression Model

You can use just use pure matrix calculation to calculate the regression coefficients. Below is the process.

\[ Y= \left[ \begin{array} {} y_{11} \\ y_{12} \\ y_{13} \\ ..\\y_{1n} \end{array} \right] = \left[ \begin{array} {} b_0+b_1 x_{11} + b_2 x_{21} \\ b_0+b_1 x_{12}+b_2 x_{22} \\ b_0+b_1 x_{13}+ b_2 x_{23} \\..\\b_0+b_1 x_{1n} + b_2 x_{2n} \end{array} \right] = \left[ \begin{array} {} 1& x_{11} & x_{21} \\ 1 & x_{12} & x_{22} \\ 1 & x_{13} & x_{23} \\..\\1 & x_{1n} & x_{2n} \end{array} \right] \begin{bmatrix} b_0\\ b_1\\ b_2\end{bmatrix} = X B \]

Thus, we can get the following.

\[ Y = XB \]

We can time X transpose on both sides and get the following.

\[ X^TY = X^TXB \]

Since XT X is a square matrix, we can calculate its inverse matrix and time both sides.

\[ (X^T X)^{-1} X^TY =(X^T X)^{-1} X^T X B\]

Since (XT X)-1XT X is an identity matrix, we can write it as follows.

\[ (X^T X)^{-1} X^TY = B\]

If we change the position of left and right, it will become below. By using the following function, we can calculate the regression coefficients of the linear model.

\[B =(X^TX)^{-1}X^TY\]

Where,

\[ B = \begin{bmatrix} b_0\\ b_1\\ b_2\end{bmatrix} \]

\[ X= \left[ \begin{array} {} 1& x_{11} & x_{21} \\ 1 & x_{12} & x_{22} \\ 1 & x_{13} & x_{23} \\..\\1 & x_{1n} & x_{2n} \end{array} \right] \]

\[ Y= \left[ \begin{array} {} y_{11} \\ y_{12} \\ y_{13} \\ ..\\y_{1n} \end{array} \right] \]

4. Use Python Numpy for Linear Regression Model

We can use NumPy to do matrix manipulation and calculation. The following is a linear regression model, including household income as IVs and purchase intention as DV.

\[f(x)=b_0 +b_1 \times Price+b_2 \times Household \ Income \]

The following is the hypothetical data, including purchase intention as DV and prices and household income as IVs.

PricesHousehold IncomePurchase Intention
577
656
745
865
933
1034
Data for Linear Regression Model

Step 1: Prepare the X matrix and Y vector

import numpy as np
X_rawdata = np.array([np.ones(6),[5,6,7,8,9,10], [7,5,4,6,3,3]])
X_matrix=X_rawdata.T
print("X Matrix:\n", X_matrix)

Output:

X Matrix:
 [[ 1.  5.  7.]
 [ 1.  6.  5.]
 [ 1.  7.  4.]
 [ 1.  8.  6.]
 [ 1.  9.  3.]
 [ 1. 10.  3.]]
Y_rawdata = np.array([[7,6,5,5,3,4]])
Y_vector=Y_rawdata.T
print("Y Vector:\n",Y_vector)

Output:

Y Vector:
 [[7]
 [6]
 [5]
 [5]
 [3]
 [4]]

Step 2: Calculate XT and XTX

X_matrix_T=X_matrix.transpose()
print("X Matrix Transpose:\n",X_matrix_T)

Output:

X Matrix Transpose:
 [[ 1.  1.  1.  1.  1.  1.]
 [ 5.  6.  7.  8.  9. 10.]
 [ 7.  5.  4.  6.  3.  3.]]
X_T_X=np.matmul(X_matrix_T,X_matrix)
print(X_T_X)

Output:

[[  6.  45.  28.]
 [ 45. 355. 198.]
 [ 28. 198. 144.]]

Step 3: Calculate (XTX)-1

X_T_X_Inv=np.linalg.inv(X_T_X) 
print(X_T_X_Inv)

Output:

[[22.23134328 -1.74626866 -1.92164179]
 [-1.74626866  0.14925373  0.13432836]
 [-1.92164179  0.13432836  0.19589552]]

Step 4: Calculate (XTX)-1XTY

X_T_X_Inv@X_matrix_T@Y_vector

Output:

array([[ 6.73880597],
       [-0.44776119],
       [ 0.34701493]])

Step 5: Write out the linear regression model

We can see ?₀ = 6.73, ?₁ = -0.45, and b2 =0.35. We can write the estimated regression function below.

\[f(x)=b_0 +b_1x_1+b_2x_2=6.73-0.45Price+0.35Household Income\]

5. Use numpy.linalg.lstsq to verify

We can use the Numpy function numpy.linalg.lstsq to verify our calculation above. Below is the Python code for linear regression regression model.

results=np.linalg.lstsq(X_matrix, Y_vector, rcond=None)[0]
print(results)

Output:

[[ 6.73880597]
 [-0.44776119]
 [ 0.34701493]]

As we can see, it is exactly the same as matrix calculation method shown above. Thus, we know that we did it correctly by using the matrix method.


Further Reading