Since we had originally considered the regions of [−1000, +1000] bp around the TSS (Dataset S2) as a putative promoter region, for protein-coding genes we might have included some coding exonic sequences, therefore introducing coding sequence bias. To avoid this, we also performed the analysis (Text S1 Methods section) using only upstream promoter regions ([−1000, 0] bp upstream of the TSS). Using this promoter set, we were able to distinguish between lncRNA and protein-coding gene promoters with 77% accuracy (Table 1, Table S5). Moreover, to avoid a bias caused by the more abundant presence of CGIs at protein-coding gene promoters, we built another model for the upstream promoter regions ([−1000, 0] bp, Dataset S2) having no overlap with CGIs (Text S1 Methods section). Although the performance of the model decreased, we were still able to distinguish between lncRNA and protein coding gene promoters with 71% accuracy (Table S5).