Optimized splitting of mixed-species RNA sequencing data.
- Authors
- Song, Xuan; Gao, Hai Yun; Herrup, Karl; Hart, Ronald P
- Year
- 2022
- Journal
- Journal of bioinformatics and computational biology
- PMID
- 34991436
- DOI
- 10.1142/S0219720022500019
- PMCID
- PMC9081140
Gene expression studies using xenograft transplants or co-culture systems, usually with mixed human and mouse cells, have proven to be valuable to uncover cellular dynamics during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating [Formula: see text] accuracy across a range of species ratios. Alignment-independent methods, such as convolutional neural networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. While non-alignment strategies successfully partitioned reads by species, a more traditional approach of mixed-genome alignment followed by optimized separation of reads proved to be the more successful with lower error rates.
Comparison of alignment methods. A. Percentage of reads aligned using each method. Dots indicate the proportion of the two genomes. B. The error rate is calculated as the difference between fraction of reads aligned and the expected fraction, summed for each genome. C. Accuracy was assessed for each method and genome proportion by comparing the count of human read pairs correctly aligned to the human genome to the number of human read pairs in the input file.
Performance of Hidden Markov Models. A. Separation of human and mouse reads compared with randomly generated sequence with same length using third order Markov Model. B. Separation of human and mouse reads with 10th order of Markov Models. C. ROC plot showing the false positive rate and true positive rate of different orders of Markov models.
Performance of Convolutional Neural Networks. A. Accuracy of classification with different ratios of human reads in the training datasets. B. Accuracy of classification with different ratios of human reads in the testing datasets. Accuracy was calculated as the percentage of correctly classified reads in the input data.
No entities extracted from this document yet.
No uploaded files.
| Citation | PMID | DOI | Status |
|---|---|---|---|
| AgrawalA and MittalN 2020, “Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy,” Visual Computer 36, 405–412. | — | — | — |
| AkhtarN and RagavendranU 2020, “Interpretation of intelligence in CNN-pooling processes: a methodological survey,” Neural Computing & Applications 32, 879–898. | — | — | — |
| AlbawiS, MohammedT and Al-ZawiS 2017, “Understanding of a convolutional neural network,” 2017 International Conference on Engineering and Technology (ICET), pp. 1–6. | — | — | — |
| ArifR, SiddiqueM, KhanM and OisheM 2018, “Study and Observation of the Variations of Accuracies for Handwritten Digits Recognition with Various Hidden Layers and Epochs using Convolutional Neural Network,” 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT), pp. 112–117. | — | — | — |
| BaronM, VeresA, WolockSL, FaustAL, GaujouxR, VetereA, RyuJH, WagnerBK, Shen-OrrSS, KleinAM, MeltonDA and YanaiI 2016, “A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure,” Cell Syst 3, 346–360 e4.2766736510.1016/j.cels.2016.08.011PMC5228327 | — | — | — |
| BedellMA, JenkinsNA and CopelandNG 1997, “Mouse models of human disease. Part I: techniques and resources for genetic analysis in mice,” Genes Dev 11, 1–10.900004710.1101/gad.11.1.1 | — | — | — |
| BehnkeS 2003, Hierarchical neural networks for image interpretation. (Springer). | — | — | — |
| BocherO and GeninE 2020, “Rare variant association testing in the noncoding genome,” Hum Genet 139, 1345–1362.3250024010.1007/s00439-020-02190-y | — | — | — |
| BrayNL, PimentelH, MelstedP and PachterL 2016, “Near-optimal probabilistic RNA-seq quantification,” Nat Biotechnol 34, 525–7.2704300210.1038/nbt.3519 | — | — | — |
| BurksDJ and AzadRK 2020, “Higher-order Markov models for metagenomic sequence classification,” Bioinformatics 36, 4130–4136.3251635510.1093/bioinformatics/btaa562 | — | — | — |
| ChenJ, HuaZ, WangJ and ChengS 2017, “A convolutional neural network with dynamic correlation pooling,” 2017 13th International Conference on Computational Intelligence and Security (CIS), pp. 496–499. | — | — | — |
| ClarkSC, CherejiRV, LeePR, FieldsRD and ClarkDJ 2020, “Differential nucleosome spacing in neurons and glia,” Neurosci Lett 714, 134559.3163942110.1016/j.neulet.2019.134559PMC6943982 | — | — | — |
| CollobertR, WestonJ, BottouL, KarlenM, KavukcuogluK and KuksaP 2011, “Natural Language Processing (Almost) from Scratch,” Journal of Machine Learning Research 12, 2493–2537. | — | — | — |
| CrewsL and MasliahE 2010, “Molecular mechanisms of neurodegeneration in Alzheimer’s disease,” Hum Mol Genet 19, R12–20.2041365310.1093/hmg/ddq160PMC2875049 | — | — | — |
| EddySR 1998, “Profile hidden Markov models,” Bioinformatics 14, 755–63.991894510.1093/bioinformatics/14.9.755 | — | — | — |
| ElseaSH and LucasRE 2002, “The mousetrap: what we can learn when the mouse model does not mimic the human disease,” ILAR journal 43, 66–79.1191715810.1093/ilar.43.2.66 | — | — | — |
| Espuny-CamachoI, 2017, “Hallmarks of Alzheimer’s Disease in Stem-Cell-Derived Human Neurons Transplanted into Mouse Brain,” Neuron 93, 1066–1081 e8.2823854710.1016/j.neuron.2017.02.001 | — | — | — |
| EwingB, HillierL, WendlMC and GreenP 1998, “Base-calling of automated sequencer traces usingPhred. I. Accuracy assessment,” Genome research 8, 175–185.952192110.1101/gr.8.3.175 | — | — | — |
| FinnRD, ClementsJ and EddySR 2011, “HMMER web server: interactive sequence similarity searching,” Nucleic Acids Res 39, W29–37.2159312610.1093/nar/gkr367PMC3125773 | — | — | — |
| FinnRD, CoggillP, EberhardtRY, EddySR, MistryJ, MitchellAL, PotterSC, PuntaM, QureshiM, Sangrador-VegasA, SalazarGA, TateJ and BatemanA 2016, “The Pfam protein families database: towards a more sustainable future,” Nucleic Acids Res 44, D279–85.2667371610.1093/nar/gkv1344PMC4702930 | — | — | — |
| FridmanWH, PagesF, Sautes-FridmanC and GalonJ 2012, “The immune contexture in human tumours: impact on clinical outcome,” Nat Rev Cancer 12, 298–306.2241925310.1038/nrc3245 | — | — | — |
| GavrilovAD, JordacheA, VasdaniM and DengJ 2018, “Preventing model overfitting and underfitting in convolutional neural networks,” International Journal of Software Science and Computational Intelligence (IJSSCI) 10, 19–28. | — | — | — |
| GongY, WangL, GuoR and LazebnikS 2014, “Multi-scale orderless pooling of deep convolutional activation features,” European conference on computer vision, pp. 392–407. | — | — | — |
| GuénetJ 2005, “Inducing Alterations in the Mammalian Genome for Investigating the Functions of Genes,” in Mammalian Genomics, eds. RuvinskyA and Marshall GravesJ (CABI Publishing, Cambridge, MA), pp. 221–262. | — | — | — |
| HalikereA, PopovaD, ScarnatiMS, HamodA, SwerdelMR, MooreJC, TischfieldJA, HartRP and PangZP 2020, “Addiction associated N40D mu-opioid receptor variant modulates synaptic function in human neurons,” Mol Psychiatry 25, 1406–1419.3148175610.1038/s41380-019-0507-0PMC7051890 | — | — | — |
| HintonGE, SrivastavaN, KrizhevskyA, SutskeverI and SalakhutdinovRR 2012, “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv preprint arXiv:1207.0580. | — | — | — |
| HoytR, HawkinsJ, St ClairM and KennettM 2007, “The mouse in biomedical research,” American College of Laboratory Animal Medicine 3. | — | — | — |
| HuangZ, DongM, MaoQ and ZhanY 2014, “Speech emotion recognition using CNN,” Proceedings of the 22nd ACM international conference on Multimedia, pp. 801–804. | — | — | — |
| HusainSS and BoberM 2019, “REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval,” IEEE Trans Image Process 28, 5201–5213.10.1109/TIP.2019.291723431135362 | — | — | — |
| JewB, AlvarezM, RahmaniE, MiaoZ, KoA, GarskeKM, SulJH, PietiläinenKH, PajukantaP and HalperinE 2020, “Accurate estimation of cell composition in bulk expression through robust integration of single-cell information,” Nature communications 11, 1–11.10.1038/s41467-020-15816-6PMC718168632332754 | — | — | — |
| KalchbrennerN, EspeholtL, SimonyanK, OordA. v. d., GravesA and KavukcuogluK 2016, “Neural machine translation in linear time,” arXiv preprint arXiv:1610.10099. | — | — | — |
| KalchbrennerN, GrefenstetteE and BlunsomP 2014, “A Convolutional Neural Network for Modelling Sentences,” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol 1, 655–665. | — | — | — |
| KathiresanN, TemanniR, AlmabraziH, SyedN, JitheshPV and Al-AliR 2017, “Accelerating next generation sequencing data analysis with system level optimizations,” Sci Rep 7, 9058.2883109010.1038/s41598-017-09089-1PMC5567265 | — | — | — |
| KerbelRS 1998, “What is the optimal rodent model for anti-tumor drug testing?,” Cancer and Metastasis Reviews 17, 301–304.1035288410.1023/a:1006152915959 | — | — | — |
| KingmaDP and BaJ 2014, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980. | — | — | — |
| KoushikJ and HayashiH 2016, “Improving stochastic gradient descent with feedback.” | — | — | — |
| LinS, LinY, NeryJR, UrichMA, BreschiA, DavisCA, DobinA, ZaleskiC, BeerMA, ChapmanWC, GingerasTR, EckerJR and SnyderMP 2014, “Comparison of the transcriptional landscapes between human and mouse tissues,” Proceedings of the National Academy of Sciences of the United States of America 111, 17224–17229.2541336510.1073/pnas.1413624111PMC4260565 | — | — | — |
| LowLK and ChengHJ 2006, “Axon pruning: an essential step underlying the developmental plasticity of neuronal connections,” Philos Trans R Soc Lond B Biol Sci 361, 1531–44.1693997310.1098/rstb.2006.1883PMC1664669 | — | — | — |
| MohammadiS, ZuckermanN, GoldsmithA and GramaA 2016, “A critical survey of deconvolution methods for separating cell types in complex tissues,” Proceedings of the IEEE 105, 340–366. | — | — | — |
| MorseHIII 2007, “Building a better mouse: One hundred years of genetics and biology,” in The mouse in biomedical research, (Elsevier), pp. 1–11. | — | — | — |
| Mouse Genome SequencingC, 2002, “Initial sequencing and comparative analysis of the mouse genome,” Nature 420, 520–62.1246685010.1038/nature01262 | — | — | — |
| MunchK and KroghA 2006, “Automatic generation of gene finders for eukaryotic species,” BMC Bioinformatics 7, 263.1671273910.1186/1471-2105-7-263PMC1522026 | — | — | — |
| NagiJ, DucatelleF, Di CaroGA, CireşanD, MeierU, GiustiA, NagiF, SchmidhuberJ and GambardellaLM 2011, “Max-pooling convolutional neural networks for vision-based hand gesture recognition,” 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 342–347. | — | — | — |
| NewmanAM, LiuCL, GreenMR, GentlesAJ, FengW, XuY, HoangCD, DiehnM and AlizadehAA 2015, “Robust enumeration of cell subsets from tissue expression profiles,” Nat Methods 12, 453–7.2582280010.1038/nmeth.3337PMC4739640 | — | — | — |
| OniEN, HalikereA, LiG, Toro-RamosAJ, SwerdelMR, VerpeutJL, MooreJC, BelloNT, BierutLJ, GoateA, TischfieldJA, PangZP and HartRP 2016, “Increased nicotine response in iPSC-derived human neurons carrying the CHRNA5 N398 allele,” Sci Rep 6, 34341.2769840910.1038/srep34341PMC5048107 | — | — | — |
| OquabM, BottouL, LaptevI and SivicJ 2014, “Learning and transferring mid-level image representations using convolutional neural networks,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1717–1724. | — | — | — |
| PangZP, YangN, VierbuchenT, OstermeierA, FuentesDR, YangTQ, CitriA, SebastianoV, MarroS, SudhofTC and WernigM 2011, “Induction of human neuronal cells by defined transcription factors,” Nature 476, 220–3.2161764410.1038/nature10202PMC3159048 | — | — | — |
| PhanH, AndreottiF, CoorayN, ChenOY and De VosM 2019, “Joint Classification and Prediction CNN Framework for Automatic Sleep Stage Classification,” IEEE Trans Biomed Eng 66, 1285–1296.3034627710.1109/TBME.2018.2872652PMC6487915 | — | — | — |
| PiovesanA, PelleriMC, AntonarosF, StrippoliP, CaracausiM and VitaleL 2019, “On the length, weight and GC content of the human genome,” BMC Res Notes 12, 106.3081396910.1186/s13104-019-4137-zPMC6391780 | — | — | — |
| PoernomoA and KangDK 2018, “Biased Dropout and Crossmap Dropout: Learning towards effective Dropout regularization in convolutional neural network,” Neural Netw 104, 60–67.2971568410.1016/j.neunet.2018.03.016 | — | — | — |
| RahierJ, GoebbelsRM and HenquinJC 1983, “Cellular composition of the human diabetic pancreas,” Diabetologia 24, 366–71.634778410.1007/BF00251826 | — | — | — |
| ReddiSJ, KaleS and KumarS 2019, “On the convergence of adam and beyond,” arXiv preprint arXiv:1904.09237. | — | — | — |
| RosenthalN and BrownS 2007, “The mouse ascending: perspectives for human-disease models,” Nat Cell Biol 9, 993–9.1776288910.1038/ncb437 | — | — | — |
| ScarnatiMS, BorelandAJ, JoelM, HartRP and PangZP 2020, “Differential sensitivity of human neurons carrying mu opioid receptor (MOR) N40D variants in response to ethanol,” Alcohol 87, 97–109.3256131110.1016/j.alcohol.2020.05.004PMC7958146 | — | — | — |
| ShaoZ, 2019, “Dysregulated protocadherin-pathway activity as an intrinsic defect in induced pluripotent stem cell-derived cortical interneurons from subjects with schizophrenia,” Nat Neurosci 22, 229–242.3066476810.1038/s41593-018-0313-zPMC6373728 | — | — | — |
| ShenY, HeX, GaoJ, DengL and MesnilG 2014, “Learning semantic representations using convolutional neural networks for web search,” Proceedings of the 23rd international conference on world wide web, pp. 373–374. | — | — | — |
| ShiY, KirwanP, SmithJ, RobinsonHP and LiveseyFJ 2012, “Human cerebral cortex development from pluripotent stem cells to functional excitatory synapses,” Nat Neurosci 15, 477–86, S1.2230660610.1038/nn.3041PMC3882590 | — | — | — |
| SpringenbergJT, DosovitskiyA, BroxT and RiedmillerM 2014, “Striving for simplicity: The all convolutional net,” arXiv preprint arXiv:1412.6806. | — | — | — |
| ThapaKS, ChenAB, LaiD, XueiX, WetherillL, TischfieldJA, LiuY and EdenbergHJ 2020, “Identification of Functional Genetic Variants Associated With Alcohol Dependence and Related Phenotypes Using a High-Throughput Assay,” Alcohol Clin Exp Res 44, 2494–2518.3311991010.1111/acer.14492PMC7725989 | — | — | — |
| ThompsonLH and BjorklundA 2015, “Reconstruction of brain circuitry by neural transplants generated from pluripotent stem cells,” Neurobiol Dis 79, 28–40.2591302910.1016/j.nbd.2015.04.003 | — | — | — |
| ToliasG, SicreR and JégouH 2015, “Particular object retrieval with integral max-pooling of CNN activations,” arXiv preprint arXiv:1511.05879. | — | — | — |
| ValuevaMV, NagornovNN, LyakhovPA, ValuevGV and ChervyakovNI 2020, “Application of the residue number system to reduce hardware costs of the convolutional neural network implementation,” Mathematics and Computers in Simulation 177, 232–243. | — | — | — |
| WangJ, HuangM, TorreE, DueckH, ShafferS, MurrayJ, RajA, LiM and ZhangNR 2018, “Gene expression distribution deconvolution in single-cell RNA sequencing,” Proc Natl Acad Sci U S A 115, E6437–E6446.2994602010.1073/pnas.1721085115PMC6048536 | — | — | — |
| WangX, ParkJ, SusztakK, ZhangNR and LiM 2019, “Bulk tissue cell type deconvolution with multi-subject single-cell expression reference,” Nat Commun 10, 380.3067069010.1038/s41467-018-08023-xPMC6342984 | — | — | — |
| WanX, SongH, LuoL, LiZ, ShengG and JiangX 2018, “Pattern recognition of partial discharge image based on one-dimensional convolutional neural network,” 2018 Condition Monitoring and Diagnosis (CMD), pp. 1–4. | — | — | — |
| WheelerTJ and EddySR 2013, “nhmmer: DNA homology search with profile HMMs,” Bioinformatics 29, 2487–9.2384280910.1093/bioinformatics/btt403PMC3777106 | — | — | — |
| WindremMS, SchanzSJ, MorrowC, MunirJ, Chandler-MilitelloD, WangS and GoldmanSA 2014, “A competitive advantage by neonatally engrafted human glial progenitors yields mice whose brains are chimeric for human glia,” J Neurosci 34, 16153–61.2542915510.1523/JNEUROSCI.1510-14.2014PMC4244478 | — | — | — |
| XiaoX, ChangH and LiM 2017, “Molecular mechanisms underlying noncoding risk variations in psychiatric genetic studies,” Mol Psychiatry 22, 497–511.2804406310.1038/mp.2016.241PMC5378805 | — | — | — |
| XuRJ, LiXX, BorelandAJ, PosytonA, KwanK, HartRP and JiangP 2020, “Human iPSC-derived mature microglia retain their identity and functionally integrate in the chimeric mouse brain,” Nature Communications 11, 1–16.10.1038/s41467-020-15411-9PMC710133032221280 | — | — | — |
| XuZ, YangY and HauptmannAG 2015, “A discriminative CNN video representation for event detection,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1798–1807. | — | — | — |
| YihW. t., ToutanovaK, PlattJC and MeekC 2011, “Learning discriminative projections for text similarity measures,” Proceedings of the fifteenth conference on computational natural language learning, pp. 247–256. | — | — | — |
| ZhangY and WallaceB 2015, “A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification,” arXiv preprint arXiv:1510.03820. | — | — | — |
| ZhangY, TianY, KongY, ZhongB and FuY 2018, “Residual dense network for image super-resolution,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2472–2481. | — | — | — |
| ZhuX and BainM 2017, “B-CNN: branch convolutional neural network for hierarchical classification,” arXiv preprint arXiv:1709.09890. | — | — | — |
| ZiegenhainC, ViethB, ParekhS, ReiniusB, Guillaumet-AdkinsA, SmetsM, LeonhardtH, HeynH, HellmannI and EnardW 2017, “Comparative Analysis of Single-Cell RNA Sequencing Methods,” Mol Cell 65, 631–643 e4.2821274910.1016/j.molcel.2017.01.023 | — | — | — |
| ZouKH, O’MalleyAJ and MauriL 2007, “Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models,” Circulation 115, 654–7.1728328010.1161/CIRCULATIONAHA.105.594929 | — | — | — |
| ÖzgenelÇF and SorguçAG 2018, “Performance comparison of pretrained convolutional neural networks on crack detection in buildings,” ISARC. Proceedings of the International Symposium on Automation and Robotics in Construction, pp. 1–8. | — | — | — |
No papers in this knowledge base cite this source.
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Functional and molecular rescue of aganglionic colon by human enteric nervous system progenitor transplantation in Hirschsprung disease | Jevans B et al. | — | 2025 | — |
| Therapeutic potential of human microglia transplantation in a chimeric model of CSF1R-related leukoencephalopathy. | Chadarevian JP et al. | — | 2024 | → |
| Neuron-Glia-Ratio-Like Approach Evidenced for Limited Variability and In-Aggregate Circadian Shifts in Cortical Cell-Specific Transcriptomes. | Shchepina OA et al. | — | 2023 | → |