Chunk #7 — 2 Algorithms for the Lasso, Ridge Regression and the Elastic Net

Source: Regularization Paths for Generalized Linear Models via Coordinate Descent.
Embedded: yes

Text

We consider the usual setup for linear regression. We have a response variable Y ∈ ℝ and a predictor vector X ∈ ℝp, and we approximate the regression function by a linear model E(Y|X = x) = β0 + xT β. We have N observation pairs (xi, yi). For simplicity we assume the xij are standardized: ∑i=1Nxij=0,1N∑i=1Nxij2=1,forj=1,…,p. Our algorithms generalize naturally to the unstandardized case. The elastic net solves the following problem (1)min(β0,β)∈ℝp+1[12N∑i=1N(yi−β0−xiTβ)2+λPα(β)], where (2)Pα(β)=(1−α)12‖β‖ℓ22+α‖β‖ℓ1 (3)=∑j=1p[12(1−α)βj2+α|βj|] is the elastic-net penalty [Zou and Hastie, 2005]. Pα is a compromise between the ridge-regression penalty (α = 0) and the lasso penalty (α = 1). This penalty is particularly useful in the p ≫ N situation, or any situation where there are many correlated predictor variables.