As in the two class case, the data can be presented in the form of a N × K matrix miℓ of non-negative numbers. For example, if the data are grouped: at each xi we have a number of multinomial samples, with miℓ falling into category ℓ. In this case we divide each row by the row-sum mi = ∑ℓ miℓ, and produce our response matrix yiℓ = miℓ/mi. mi becomes an observation weight. Our penalized maximum likelihood algorithm changes in a trivial way. The working response (24) is defined exactly the same way (using yiℓ just defined). The weights in (25) get augmented with the observation weight mi: (30)wiℓ=mip˜ℓ(xi)(1−p˜ℓ(xi)).