The fact that most 5-mers are contained within the control genomes used also supports the notion that the derived error models can be used to accurately simulate reads from any unrelated reference genomes. For the 10% of 5-mers not well represented within the control genomes, GemSIM derives an error rate based on the relevant 4-mer (or 3-mer, for the one PhiX 4-mer mentioned above).