Chunk #23 — GENEMANIA FREQUENTLY ASKED QUESTIONS — GeneMANIA automated data build pipeline

Source: GeneMANIA prediction server 2013 update.
Embedded: yes

Text

Our network data come from six main sources: IRefIndex (23) and BioGRID (24) for physical (i.e., protein) interactions and genetic interactions; Gene Expression Omnibus (GEO) for co-expression networks (25); InterPro, via Ensembl, for protein domains (26); I2D for networks of interologs of physical interactions (27); and from our own manual curation efforts. Gene Ontology annotations are downloaded directly from the Gene Ontology Web site as part of our automated data build, and gene identifiers are retrieved from the Entrez Gene and Ensembl databases. We further process data from IRefIndex, BioGRID and I2D to extract individual interaction networks that are associated with Pubmed IDs. Interaction studies reporting <100 interactions are all consolidated into a single network (e.g. ‘IREF-SMALL-SCALE-STUDIES’). We also consolidate networks by curation source [e.g. ‘IREF-MINT’ (28)], so each interaction is represented in two different networks. In some cases, the same PubMed ID is associated with multiple networks (e.g. those containing interactions detected at stringent and permissive thresholds); in this case, the different networks are represented by appending different letters to the network name. The transformation of mRNA expression profile