Consider a coordinate descent step for solving (1). That is, suppose we have estimates β̃0 and β̃ℓ for ℓ ≠ j, and we wish to partially optimize with respect to βj. Denote by R(β0, β) the objective function in (1). We would like to compute the gradient at βj = β̃j, which only exists if β̃j ≠ 0. If β̃j > 0, then (4)∂R∂βj|β=β˜=−1N∑i=1Nxij(yi−β˜o−xiTβ˜)+λ(1−α)βj+λα.