1. Definition of Linear Regression Model
Multiple linear regression is a linear model accessing the relationship between a dependent variable (DV, or Y
) and multiple intendent variables (IV, or X
).
For instance, you might want to test how consumer purchase intention can be impacted by price as well as by household income. In this case, consumer purchase intention is the DV or Y
, whereas price and household income are the IVs or Xs
. Conceptually, you can think of it as follows.
2. Math Statements of Linear Regression Model
Below is the regression function, in which ?₀ is the intercept and ?₁ and ?2 are the regression coefficients and ? is the random error.
\[y=\beta_0 +\beta_1x_1+\beta_2x_2+\epsilon\]
You might have questions regarding what criteria to find ?₀, ?₁, and b2. It is based on the goal of minimizing the residual — same logic as simple linear regression
\[f(x)=b_0 +b_1x_1+b_2x_2\]
3. Matrix Solution for Linear Regression Model
You can use just use pure matrix calculation to calculate the regression coefficients. Below is the process.
\[ Y= \left[ \begin{array} {} y_{11} \\ y_{12} \\ y_{13} \\ ..\\y_{1n} \end{array} \right] = \left[ \begin{array} {} b_0+b_1 x_{11} + b_2 x_{21} \\ b_0+b_1 x_{12}+b_2 x_{22} \\ b_0+b_1 x_{13}+ b_2 x_{23} \\..\\b_0+b_1 x_{1n} + b_2 x_{2n} \end{array} \right] = \left[ \begin{array} {} 1& x_{11} & x_{21} \\ 1 & x_{12} & x_{22} \\ 1 & x_{13} & x_{23} \\..\\1 & x_{1n} & x_{2n} \end{array} \right] \begin{bmatrix} b_0\\ b_1\\ b_2\end{bmatrix} = X B \]
Thus, we can get the following.
\[ Y = XB \]
We can time X transpose on both sides and get the following.
\[ X^TY = X^TXB \]
Since XT X is a square matrix, we can calculate its inverse matrix and time both sides.
\[ (X^T X)^{-1} X^TY =(X^T X)^{-1} X^T X B\]
Since (XT X)-1XT X is an identity matrix, we can write it as follows.
\[ (X^T X)^{-1} X^TY = B\]
If we change the position of left and right, it will become below. By using the following function, we can calculate the regression coefficients of the linear model.
\[B =(X^TX)^{-1}X^TY\]
Where,
\[ B = \begin{bmatrix} b_0\\ b_1\\ b_2\end{bmatrix} \]
\[ X= \left[ \begin{array} {} 1& x_{11} & x_{21} \\ 1 & x_{12} & x_{22} \\ 1 & x_{13} & x_{23} \\..\\1 & x_{1n} & x_{2n} \end{array} \right] \]
\[ Y= \left[ \begin{array} {} y_{11} \\ y_{12} \\ y_{13} \\ ..\\y_{1n} \end{array} \right] \]
4. Use Python Numpy for Linear Regression Model
We can use NumPy to do matrix manipulation and calculation. The following is a linear regression model, including household income as IV
s and purchase intention as DV
.
\[f(x)=b_0 +b_1 \times Price+b_2 \times Household \ Income \]
The following is the hypothetical data, including purchase intention as DV
and prices and household income as IV
s.
Prices | Household Income | Purchase Intention |
---|---|---|
5 | 7 | 7 |
6 | 5 | 6 |
7 | 4 | 5 |
8 | 6 | 5 |
9 | 3 | 3 |
10 | 3 | 4 |
Step 1: Prepare the X matrix and Y vector
import numpy as np
X_rawdata = np.array([np.ones(6),[5,6,7,8,9,10], [7,5,4,6,3,3]])
X_matrix=X_rawdata.T
print("X Matrix:\n", X_matrix)
Output:
X Matrix: [[ 1. 5. 7.] [ 1. 6. 5.] [ 1. 7. 4.] [ 1. 8. 6.] [ 1. 9. 3.] [ 1. 10. 3.]]
Y_rawdata = np.array([[7,6,5,5,3,4]])
Y_vector=Y_rawdata.T
print("Y Vector:\n",Y_vector)
Output:
Y Vector: [[7] [6] [5] [5] [3] [4]]
Step 2: Calculate XT and XTX
X_matrix_T=X_matrix.transpose()
print("X Matrix Transpose:\n",X_matrix_T)
Output:
X Matrix Transpose: [[ 1. 1. 1. 1. 1. 1.] [ 5. 6. 7. 8. 9. 10.] [ 7. 5. 4. 6. 3. 3.]]
X_T_X=np.matmul(X_matrix_T,X_matrix)
print(X_T_X)
Output:
[[ 6. 45. 28.] [ 45. 355. 198.] [ 28. 198. 144.]]
Step 3: Calculate (XTX)-1
X_T_X_Inv=np.linalg.inv(X_T_X)
print(X_T_X_Inv)
Output:
[[22.23134328 -1.74626866 -1.92164179] [-1.74626866 0.14925373 0.13432836] [-1.92164179 0.13432836 0.19589552]]
Step 4: Calculate (XTX)-1XTY
X_T_X_Inv@X_matrix_T@Y_vector
Output:
array([[ 6.73880597], [-0.44776119], [ 0.34701493]])
Step 5: Write out the linear regression model
We can see ?₀ = 6.73, ?₁ = -0.45, and b2 =0.35. We can write the estimated regression function below.
\[f(x)=b_0 +b_1x_1+b_2x_2=6.73-0.45Price+0.35Household Income\]
5. Use numpy.linalg.lstsq
to verify
We can use the Numpy function numpy.linalg.lstsq
to verify our calculation above. Below is the Python code for linear regression regression model.
results=np.linalg.lstsq(X_matrix, Y_vector, rcond=None)[0]
print(results)
Output:
[[ 6.73880597] [-0.44776119] [ 0.34701493]]
As we can see, it is exactly the same as matrix calculation method shown above. Thus, we know that we did it correctly by using the matrix method.