Chunk #30 — Methods — Power calculation

Source: Random forest versus logistic regression: a large-scale benchmark experiment.
Embedded: yes

Text

Considering the M×2 matrix, collecting the performance measures for the two investigated methods (LR and RF) on the M considered datasets, one can perform a test for paired samples to compare the performances of the two methods [31]. We refer to the previously published statistical framework [31] for a precise mathematical definition of the tested null-hypothesis in the case of the t-test for paired samples. In this framework, the datasets play the role of the i.i.d. observations used for the t-test. Sample size calculations for the t-test for paired samples can give an indication of the rough number of datasets required to detect a given difference δ in performances considered as relevant for a given significance level (e.g., α=0.05) and a given power (e.g., 1−β=0.8). For large numbers and a two-sided test, the required number of datasets can be approximated as 4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ M_{{req}}\approx \frac{\left(z_{1-\alpha/2}+z_{1-\beta}\right)^{2}\sigma^{2}}{\delta^{2}} $$ \end{document}Mreq≈z1−α/2+z1−β2σ2δ2