Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Incubator Q&A

Welcome to the staging ground for new communities! Each proposal has a description in the "Descriptions" category and a body of questions and answers in "Incubator Q&A". You can ask questions (and get answers, we hope!) right away, and start new proposals.

Are you here to participate in a specific proposal? Click on the proposal tag (with the dark outline) to see only posts about that proposal and not all of the others that are in progress. Tags are at the bottom of each post.

Why can't we use linear regression on logits for solving logistic regression problems? Question

+2
−0

Consider a machine learning problem with inputs $\boldsymbol{X} \in \mathbb{R}^{N \times D}$ and corresponding labels $\boldsymbol{y} \in \mathcal{Y}^N$.

When the problem is to solve a regression task, $\mathcal{Y}^N = \mathbb{R}^N$ and we can use linear regression model, $$\hat{\boldsymbol{y}} = \boldsymbol{X} \boldsymbol{w},$$ which has a closed-form solution for the parameters of our model: $$\boldsymbol{w} = \big(\boldsymbol{X}^\mathsf{T} \boldsymbol{X}\big)^{-1} \boldsymbol{X}^\mathsf{T} \boldsymbol{y}.$$

When the problem is a classification task, such that $\mathcal{Y}^N = \{0, 1\}^N,$ we would typically use a logistic regression model, $$\hat{\boldsymbol{y}} = \mathop{\sigma}(\boldsymbol{X} \boldsymbol{w}),$$ where $\sigma(s) = \big(1 + \exp(-s)\big)^{-1}$ is the logistic sigmoid. Unfortunately, there is no closed-form solution for logistic regression. Typically, we have to use methods like gradient descent to find a solution. However, since the logistic sigmoid is invertible, I would assume that we can also write the logistic regression model as $$\ln\Bigl(\frac{\hat{\boldsymbol{y}}}{1 - \hat{\boldsymbol{y}}}\Bigr) = \boldsymbol{X} \boldsymbol{w}$$ to directly model the logits, $\ln\Bigl(\frac{y}{1 - y}\Bigr)$. After all, this would allow us to use the analytical solution from linear regression to solve logistic regression problems.

Why is this approach not used anywhere? Is there a mistake in my reasoning or are there problems with this approach that I am not aware of?

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

1 answer

+0
−0

There is the obvious answer that points out that since $y \in \{0, 1\}$, we would end up with invalid values for computing the logits. However, by transforming the labels to have values $\tilde{y} \in \{\varepsilon, 1 - \varepsilon\}$ for some $\varepsilon \in (0, 0.5)$ (also known as label smoothing), this problem can be circumvented.

The actual problem is that linear regression on the logits and logistic regression solve two fundamentally different problems. To see this, it might be useful to highlight that linear and logistic regression maximise different likelihood functions. Logistic regression aims to find the Maximum Likelihood Estimator (MLE) for a Bernoulli distributed random variable. In other words, logistic regression solves the following optimisation problem: $$\min_{\boldsymbol{w} \in \mathbb{R}^D} \Big\{-\sum_{i=1}^N y_i \ln \sigma(\boldsymbol{w} \cdot \boldsymbol{x}_i) + (1 - y_i) \ln\big(1 - \sigma(\boldsymbol{w} \cdot \boldsymbol{x}_i)\big)\Big\}.$$ Linear regression, on the other hand, computes the MLE for a Gaussian random variable. Concretely, applying linear regression on the logits corresponds to the following optimisation problem: $$\min_{\boldsymbol{w} \in \mathbb{R}^D} \Bigg\{\frac{1}{2} \sum_{i=1}^N \bigg(\boldsymbol{w} \cdot \boldsymbol{x}_i - \ln \Bigl(\frac{\tilde{y}_i}{1 - \tilde{y}_i}\Bigr)\bigg)^2\Bigg\},$$ where $\tilde{y}_i = y_i + (1 - 2 y_i) \, \varepsilon$. Note that the logits do not follow a Gaussian distribution and therefore it could be argued that this model is (theoretically) not the right one for the task.

The main point is that these are two different problems with different solutions. We can verify this by taking the analytical solution from the linear regression on the logits, $$\boldsymbol{w}^* = (\boldsymbol{X}^\mathsf{T} \boldsymbol{X})^{-1} \boldsymbol{X}^\mathsf{T} \ln\Bigl(\frac{\tilde{\boldsymbol{y}}}{\boldsymbol{1} - \tilde{\boldsymbol{y}}}\Bigr),$$ and plugging it into the logistic regression objective: $$\begin{align*} \sigma(\boldsymbol{X} \boldsymbol{w}^*) &= \sigma\biggl(\boldsymbol{X} (\boldsymbol{X}^\mathsf{T} \boldsymbol{X})^{-1} \boldsymbol{X}^\mathsf{T} \ln\Bigl(\frac{\tilde{\boldsymbol{y}}}{\boldsymbol{1} - \tilde{\boldsymbol{y}}}\Bigr)\biggr) \\ &= \sigma\biggl(\boldsymbol{U} \boldsymbol{U}^\mathsf{T} \ln\Bigl(\frac{\tilde{\boldsymbol{y}}}{\boldsymbol{1} - \tilde{\boldsymbol{y}}}\Bigr)\biggr), \end{align*}$$ where $\boldsymbol{U} \in \mathbb{R}^{N \times D}$ is the first matrix of the (thin) reduced SVD of $\boldsymbol{X}$, such that $\boldsymbol{X} = \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^\mathsf{T}$. Now, if we can find a value for $\varepsilon$ so that is a solution to the system of non-linear equations given by $$\boldsymbol{U} \boldsymbol{U}^\mathsf{T} \ln\Bigl(\frac{\tilde{\boldsymbol{y}}}{\boldsymbol{1} - \tilde{\boldsymbol{y}}}\Bigr) = \ln\Bigl(\frac{\boldsymbol{y}}{\boldsymbol{1} - \boldsymbol{y}}\Bigr),$$ we would have a closed-form solution for $\boldsymbol{w}^*$, but except for some special edge cases this will typically not work out.

PS: Some scenarios for which logistic regression actually does have an analytical solution are discussed in Lipovetsky, 2015 and Zeng, 2021.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »