1. What is Logistic Regression?
Logistic regression is a model testing the relationship between Y (which is as a binary variable) and X (X can be more than one). logistic regression is also called logit regression.
2. Difference between logistic regression and linear regression
We often encounter situations where the result is binary, such as yes or no, bought or did not buy. In this case, we can not use linear regression. Linear regression is a model in which Y is a continuous variable.
3. Why Called Logistic Regression?
odds ratio is the ratio between the probability of an event happening (i.e., p(y=1)) and the probability of the event not happening (i.e., p(y=0)). Logit is also called the log-odds since it is equal to the logarithm of the odds ratio.
\[log(OddsRatio)=log\frac{p(y=1)}{1-p(y=1)}=f(x)=\beta_0 +\beta_1x_1+\beta_2x_2+\beta_3x_3…\]
The log odds ratio is in the range of [-∞, +∞], and thus it maps probability values (which are in the range of [0, 1]) to [-∞, +∞]. Note that, linear function can be also in the range of [-∞, +∞]. Thus, we can connect numerical, continues data with binary data in the same function.
\[(-\infty,+\infty):\beta_0 +\beta_1x_1+\beta_2x_2+\beta_3x_3…\]
3. Example of logistic regression
The following is the hypothetical example using age and gender to predict whether consumers buy a product or not.
Buy or Not | Age | Gender |
---|---|---|
1 | 26 | 1 |
1 | 23 | 1 |
0 | 29 | 0 |
1 | 28 | 1 |
0 | 50 | 1 |
0 | 60 | 0 |
1 | 45 | 0 |
1 | 19 | 1 |
0 | 36 | 1 |
0 | 45 | 1 |
The following is the logistic regression model.
\[ log \frac{p(1)}{1-p(1)}=\beta_0 +\beta_1 age +\beta_2 gender .\]