Here βℓ is a p-vector of coefficients. As in Zhu and Hastie [2004], here we choose a more symmetric approach. We model (20)Pr(G=ℓ|x)=eβ0ℓ+xTβℓ∑k=1Keβ0k+xTβk This parametrization is not estimable without constraints, because for any values for the parameters {β0ℓ,βℓ}1K,{β0ℓ−c0,βℓ−c}1K give identical probabilities (20). Regularization deals with this ambiguity in a natural way; see Section 4.1 below.