Chunk #17 — 2 Algorithms for the Lasso, Ridge Regression and the Elastic Net — 2.2 Covariance Updates

Source: Regularization Paths for Generalized Linear Models via Coordinate Descent.
Embedded: yes

Text

Further efficiencies can be achieved in computing the updates in (8). We can write the first term on the right (up to a factor 1/N) as (9)∑i=1Nxijri=〈xj,y〉−∑k:|β˜k|>0〈xj,xk〉β˜k, where 〈xj,y〉=∑i=1Nxijyi. Hence we need to compute inner products of each feature with y initially, and then each time a new feature xk enters the model (for the first time), we need to compute and store its inner product with all the rest of the features (O(Np) operations). We also store the p gradient components (9). If one of the coefficients currently in the model changes, we can update each gradient in O(p) operations. Hence with m non-zero terms in the model, a complete cycle costs O(pm) operations if no new variables become non-zero, and costs O(Np) for each new variable entered. Importantly, O(N) calculations do not have to be made at every step. This is the case for all penalized procedures with squared error loss.