Predicting expression-altering promoter mutations with deep learning.
Jaganathan K., Ersaro N., Novakovsky G., Wang Y., James T., Schwartzentruber J., Fiziev P., Kassam I., Cao F., Hawe J., Cavanagh H., Lim A., Png G., McRae J., Banerjee A., Kumar A., Ulirsch J., Zhang Y., Aguet F., Wainschtein P., Sundaram L., Salcedo A., Kyriazopoulou Panagiotopoulou S., Aghamirzaie D., Padhi E., Weng Z., Dong S., Smedley D., Caulfield M., O'Donnell-Luria A., Rehm HL., Sanders SJ., Kundaje A., Montgomery SB., Ross MT., Farh KK-H.
Only a minority of patients with rare genetic diseases are currently diagnosed by exome sequencing, suggesting that additional unrecognized pathogenic variants may reside in non-coding sequence. Here, we describe PromoterAI, a deep neural network that accurately identifies non-coding promoter variants which dysregulate gene expression. We show that promoter variants with predicted expression-altering consequences produce outlier expression at both RNA and protein levels in thousands of individuals, and that these variants experience strong negative selection in human populations. We observe that clinically relevant genes in rare disease patients are enriched for such variants and validate their functional impact through reporter assays. Our estimates suggest that promoter variation accounts for 6% of the genetic burden associated with rare diseases.