Predicting Splicing from Primary Sequence with Deep Learning.
Jaganathan K., Kyriazopoulou Panagiotopoulou S., McRae JF., Darbandi SF., Knowles D., Li YI., Kosmicki JA., Arbelaez J., Cui W., Schwartz GB., Chow ED., Kanterakis E., Gao H., Kia A., Batzoglou S., Sanders SJ., Farh KK-H.
The splicing of pre-mRNAs into mature transcripts is remarkable for its precision, but the mechanisms by which the cellular machinery achieves such specificity are incompletely understood. Here, we describe a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing. Synonymous and intronic mutations with predicted splice-altering consequence validate at a high rate on RNA-seq and are strongly deleterious in the human population. De novo mutations with predicted splice-altering consequence are significantly enriched in patients with autism and intellectual disability compared to healthy controls and validate against RNA-seq in 21 out of 28 of these patients. We estimate that 9%-11% of pathogenic mutations in patients with rare genetic disorders are caused by this previously underappreciated class of disease variation.