Chunk #27 — Results — Homopolymer errors

Source: Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data.
Embedded: yes

Text

Rather than directly modelling the errors, a double-generalised linear model (DGLM) was fitted to the flow-values. For a given homopolymer length and other factors, flow-values were approximately Gaussian distributed, however with a non-homogenous dispersion; that is, the mean and variance showed dependence on the explanatory variables, particularly homopolymer length ( Table 3 , Table S1). The mean effects for species, chip, homopolymer length, flow cycle and PIC were found to be important, however the machine used, and the read X and Y coordinates were not, and subsequently were removed from the model ( Methods ). The same set of factors were also found to significantly contribute to the dispersion. The model can be expressed aswhere is the flow-value, is the mean and is the log-variance. Both the mean and log variance are linear-predictors, that is,where here and are coefficients taking a different value for each species, chip and PIC respectively; and are multipliers of the homopolymer length and flow cycle respectively; and is a constant. is similarly defined, with different values of the coefficients and the exponentiation to ensure a