Why can't we use linear regression on logits for solving logistic regression problems?

−0

There is the obvious answer that points out that since $y \in \{0, 1\}$, we would end up with invalid values for computing the logits. However, by transforming the labels to have values $\tilde{y} \in \{\varepsilon, 1 - \varepsilon\}$ for some $\varepsilon \in (0, 0.5)$ (also known as label smoothing), this problem can be circumvented.

The actual problem is that linear regression on the logits and logistic regression solve two fundamentally different problems. To see this, it might be useful to highlight that linear and logistic regression maximise different likelihood functions. Logistic regression aims to find the Maximum Likelihood Estimator (MLE) for a Bernoulli distributed random variable. In other words, logistic regression solves the following optimisation problem: $$\min_{\boldsymbol{w} \in \mathbb{R}^D} \Big\{-\sum_{i=1}^N y_i \ln \sigma(\boldsymbol{w} \cdot \boldsymbol{x}_i) + (1 - y_i) \ln\big(1 - \sigma(\boldsymbol{w} \cdot \boldsymbol{x}_i)\big)\Big\}.$$ Linear regression, on the other hand, computes the MLE for a Gaussian random variable. Concretely, applying linear regression on the logits corresponds to the following optimisation problem: $$\min_{\boldsymbol{w} \in \mathbb{R}^D} \Bigg\{\frac{1}{2} \sum_{i=1}^N \bigg(\boldsymbol{w} \cdot \boldsymbol{x}_i - \ln \Bigl(\frac{\tilde{y}_i}{1 - \tilde{y}_i}\Bigr)\bigg)^2\Bigg\},$$ where $\tilde{y}_i = y_i + (1 - 2 y_i) \, \varepsilon$. Note that the logits do not follow a Gaussian distribution and therefore it could be argued that this model is (theoretically) not the right one for the task.

The main point is that these are two different problems with different solutions. We can verify this by taking the analytical solution from the linear regression on the logits, $$\boldsymbol{w}^* = (\boldsymbol{X}^\mathsf{T} \boldsymbol{X})^{-1} \boldsymbol{X}^\mathsf{T} \ln\Bigl(\frac{\tilde{\boldsymbol{y}}}{\boldsymbol{1} - \tilde{\boldsymbol{y}}}\Bigr),$$ and plugging it into the logistic regression objective: $$\begin{align*} \sigma(\boldsymbol{X} \boldsymbol{w}^*) &= \sigma\biggl(\boldsymbol{X} (\boldsymbol{X}^\mathsf{T} \boldsymbol{X})^{-1} \boldsymbol{X}^\mathsf{T} \ln\Bigl(\frac{\tilde{\boldsymbol{y}}}{\boldsymbol{1} - \tilde{\boldsymbol{y}}}\Bigr)\biggr) \\ &= \sigma\biggl(\boldsymbol{U} \boldsymbol{U}^\mathsf{T} \ln\Bigl(\frac{\tilde{\boldsymbol{y}}}{\boldsymbol{1} - \tilde{\boldsymbol{y}}}\Bigr)\biggr), \end{align*}$$ where $\boldsymbol{U} \in \mathbb{R}^{N \times D}$ is the first matrix of the (thin) reduced SVD of $\boldsymbol{X}$, such that $\boldsymbol{X} = \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^\mathsf{T}$. Now, if we can find a value for $\varepsilon$ so that is a solution to the system of non-linear equations given by $$\boldsymbol{U} \boldsymbol{U}^\mathsf{T} \ln\Bigl(\frac{\tilde{\boldsymbol{y}}}{\boldsymbol{1} - \tilde{\boldsymbol{y}}}\Bigr) = \ln\Bigl(\frac{\boldsymbol{y}}{\boldsymbol{1} - \boldsymbol{y}}\Bigr),$$ we would have a closed-form solution for $\boldsymbol{w}^*$, but except for some special edge cases this will typically not work out.

PS: Some scenarios for which logistic regression actually does have an analytical solution are discussed in Lipovetsky, 2015 and Zeng, 2021.

posted almost 2 years ago

CC BY-SA 4.0

mr Tsjolder‭

11 9 3

Copy Link

Raw

Markdown

History

Communities

Why can't we use linear regression on logits for solving logistic regression problems? Question

0 comment threads

1 answer

0 comment threads