Chunk #39 — 4 Regularized Multinomial Regression

Source: Regularization Paths for Generalized Linear Models via Coordinate Descent.
Embedded: yes

Text

The Newton algorithm for multinomial regression can be tedious, because of the vector nature of the response observations. Instead of weights wi as in (17), we get weight matrices, for example. However, in the spirit of coordinate descent, we can avoid these complexities. We perform partial Newton steps by forming a partial quadratic approximation to the log-likelihood (22), allowing only (β0ℓ, βℓ) to vary for a single class at a time. It is not hard to show that this is (23)ℓQℓ(β0ℓ,βℓ)=−12N∑i=1Nwiℓ(ziℓ−β0ℓ−xiTβℓ)2+C({β˜0k,β˜k}1K), where as before (24)ziℓ=β˜0ℓ+xiTβ˜ℓ+yiℓ−p˜ℓ(xi)p˜ℓ(xi)(1−p˜ℓ(xi)), (25)wiℓ=p˜ℓ(xi)(1−p˜ℓ(xi)),