Chunk #57 — Methods — Proteomics — Pre-processing and normalization of data

Source: Common genetic variation drives molecular heterogeneity in human iPSCs.
Embedded: yes

Text

Data for analysis was obtained from the “ProteinGroups.txt” output of MaxQuant. Contaminant and reverse hits (N = 3,419) were excluded from analysis. For each sample, the total protein abundance was calculated by summing up protein intensity (‘Intensity’) values across all proteins and protein groups. This value was then used to scale all quantification values (‘iBAQ’) per sample. For a protein or a protein group to be considered, we required at least one unique peptide mapping to it. Overall, we quantified 10,097 protein groups (4,877 unique proteins) in at least one of the samples. Only unique protein entries quantified in at least half of the samples were used in the subsequent analyses (3,435 proteins). The mean pair-wise correlation of samples was 0.87 for unique proteins (Spearman rank correlation). Based on the clustering of samples (principal component analysis and pairwise correlation of protein quantification; data not shown), one sample appeared as an outlier (‘HPSI0713i-darw_1’) and was excluded from further analyses.