Integration of metabolomics with genomics: Metabolic gene prioritization using metabolomics data and genomic variant (CADD) scores.
Bongaerts M, Bonte R, Demirdas S, Huidekoper HH, Langendonk J, Wilke M, de Valk W, Blom HJ, Reinders MJT, Ruijter GJG
Molecular genetics and metabolism, 2022 May 25
Abstract
The integration of metabolomics data with sequencing data is a key step towards improving the diagnostic process for finding the disease-causing genetic variant(s) in patients suspected of having an inborn error of metabolism (IEM). The measured metabolite levels could provide additional phenotypical evidence to elucidate the degree of pathogenicity for variants found in genes associated with metabolic processes. We present a computational approach, called Reafect, that calculates for each reaction in a metabolic pathway a score indicating whether that reaction is deficient or not. When calculating this score, Reafect takes multiple factors into account: the magnitude and sign of alterations in the metabolite levels, the reaction distances between metabolites and reactions in the pathway, and the biochemical directionality of the reactions. We applied Reafect to untargeted metabolomics data of 72 patient samples with a known IEM and found that in 81% of the cases the correct deficient enzyme was ranked within the top 5% of all considered enzyme deficiencies. Next, we integrated Reafect with Combined Annotation Dependent Depletion (CADD) scores (a measure for gene variant deleteriousness) and ranked the metabolic genes of 27 IEM patients. We observed that this integrated approach significantly improved the prioritization of the genes containing the disease-causing variant when compared with the two approaches individually. For 15/27 IEM patients the correct affected gene was ranked within the top 0.25% of the set of potentially affected genes. Together, our findings suggest that metabolomics data improves the identification of affected genes in patients suffering from IEM.