paperKB
coga / coga-kb
Help
Sign in

Chunk #38 — Methods — Data quality, probe mapping and filtering

Source
Variability of gene expression profiles in human blood and lymphoblastoid cell lines.
Embedded
yes

Text

Probes were sequence matched to NCBI Build 36.1 (hg18) using the blastn algorithm to obtain a physical position from which Ensembl transcript and Gene identifiers were extracted. Probes that showed one mismatch or more were aligned to Ensembl transcripts or EMBL ESTs using BLAST (1), and genomic locations were then established by re-mapping the target transcript to genome (NCBI build 36) either by extracting annotation data from UCSC MySQL tables or by BLAST against genomic sequence. Probes overlapping at least 10 bases of repeat sequence, established by using RepeatMasker on the transcript sequence, were discarded. Probes with SNPs (minor allele frequency > 5%, http://www.hapmap.org) in their sequence or that had no match to the human genome build 36 were removed from the analysis. We could extract Ensembl transcripts identifiers for a total of 21,855 probes.