paperKB
coga / coga-kb
Help
Sign in

Chunk #59 — 5 Timings — 5.3 Real data

Source
Regularization Paths for Generalized Linear Models via Coordinate Descent.
Embedded
yes

Text

Table 4 shows some timing results for four different datasets. Cancer [Ramaswamy et al., 2001]: gene-expression data with 14 cancer classes.Leukemia [Golub et al., 1999a]: gene-expression data with a binary response indicating type of leukemia—AML vs ALL. We used the preprocessed data of Dettling [2004].Internet-Ad [Kushmerick, 1999]: document classification problem with mostly binary features. The response is binary, and indicates whether the document is an advertisement. Only 1.2% nonzero values in the predictor matrix.Newsgroup [Lang, 1995]: document classification problem. We used the training set cultured from these data by Koh et al. [2007]. The response is binary, and indicates a subclass of topics; the predictors are binary, and indicate the presence of particular tri-gram sequences. The predictor matrix has 0.05% nonzero values.