Note-1: Linear Regression and Logistic Regression

Linear Regression

What’s Liear Regression

Linear Regression is a approach to modelling the relationship between a scalar response( or a dependent variable) and one or more explanatory variables.

It is written as the linear formula: $f(x)=w^T x+b$

Given the features values of $n$ data points, we can train to get a linear model which can fit the data set properly. When the new data point is fed into the model, we can predict the value.

We are going to find the optimal weights value:

$(w^*,b^*)= \underset{(w,b)}{\operatorname{argmix}}\sum^m_{i=1}(f(x_i)-y_i)^2$

The close form solution can be calculated through the derivative. Of course, you can also use Gradient descent to find the optimal parameters, but it’s not necessary.

Advantage

The advantages of linear regression are that it’s simple and easy to implement, and the time complexity is small.

Logistic Regression

Why Logistic Regression

Though the name of Logistic Regression includes regression, it’t not really a regression model. It’s for classification task. In this aspect, we can call Logistic Regression Analysis.

Since we have Linear Regression to do the regression task to predict the value for a new data set. Actually it can be used to predict the class for a given data. We can just set the threshold, if the predicted value is above the threshold, then it is classified into class 1, on the other hand, the data is classified into class 0.

However, there is a drawback when we use linear regression to do classification. We should set lots of thresholds according to different cases. And that’s why Logistic Regreesion came out.

What’s Logistic Regression

Some key words in Logistic Regression:

Hypothesis: Data points are Bernoulli distributed
Maximum likelihood to get the cost function
Gradient descent or Newton method to find the optimal solution

Given the generalized linear model: $y=g^{-1}(w^Tx+b)$, the $g(\cdot)$ is called link function.

The $g$ function, from unit-stop function to sigmoid function, can convert the predicted value into corresponding class.

$sign(x)=\begin{cases} 1,&x>0 \\ 0.5,&x=0 \cr 0,&x<0 \end{cases}$

Unit-step doesn’t have a very good property, it can easy to do derivativation. Then we use the sigmoid function. It has the format like this:

$y=\frac{1}{1+e^{-z}}$

The sigmoid function squash the predicted value into 0 and 1, and now we can just set one threshold and do the classification task.

Log Odds - another way to interprete LR

Log odds is another way to interprete the logistic regression. For more details, see the chapter 3 in 《机器学习周志华》.

Reference

《机器学习》周志华
Python数据科学机器学习笔记
最小二乘法的本质是什么