Probes were grouped into probe sets by aligning first to RefSeq gene annotations and then aligning unmapped probes to the human reference genome (build 36). All probes with non-unique best alignments were excluded from further analysis. Multiprobe probesets were hierarchically clustered using one minus the pearson correlation coefficients as a distance matrix. Clusters were divided into groups by cutting clusters at a dendrogram height of 0.5 (roughly producing clusters with internal correlation coefficients >0.5). All downstream analyses were performed independently on each resulting cluster and all single probe probesets.