paperKB
coga / coga-kb
Help
Sign in

Chunk #19 — MATERIALS AND METHODS — Estimation methods

Source
Classification and selection of biomarkers in genomic data using LASSO.
Embedded
yes

Text

We propose to use an optimal scoring procedure for classification, where LASSO estimation is incorporated. In the notation of the previous section, we wish to solve the following optimization problem. Minimize n−1∑i=1n{θ(gi)−XiTη}2+λ∑j=1p|ηj|(3) subject to the constraint N−1‖ZΘ‖2 = 1. Here is the outline for our procedure. Choose an initial score matrix Θ0 satisfying Θ0TDPΘ0=I, and let Θ0 = ZΘ.Fit a linear regression model of Θ0 on X subject to an L1 constraint on the parameters. Define the fitted values Θ0∗. Let f^(X) be the vector of fitted regression functions.Obtain the eigenvector matrix Φ of Θ0∗TΘ0 ; the optimal scores are Θ= Θ0Φ.Define fopt(x)=ΦTf^(x). Note that we are incorporating the LASSO estimation procedure in step (2) of the algorithm. We cannot use the algorithm of Tibshirani [11] because it is too computationally intensive for large p (number of genes). However, it turns out that the algorithm can be fit using standard software for SVMs, which we will now describe.