A machine-learning approach for nonalcoholic steatohepatitis susceptibility estimation

Ghadiri, Fatemeh; Husseini, Abbas Ali; Öztaş, Oğuzhan

Advanced Search

View/Open

Makale / Article (1.232Mb)

Date

2022

Author

Ghadiri, Fatemeh
Husseini, Abbas Ali
Öztaş, Oğuzhan

Metadata

Show full item record

Abstract

Background Nonalcoholic steatohepatitis (NASH), a severe form of nonalcoholic fatty liver disease, can lead to advanced liver damage and has become an increasingly prominent health problem worldwide. Predictive models for early identification of highrisk individuals could help identify preventive and interventional measures. Traditional epidemiological models with limited predictive power are based on statistical analysis. In the current study, a novel machine-learning approach was developed for individual NASH susceptibility prediction using candidate single nucleotide polymorphisms (SNPs). Methods A total of 245 NASH patients and 120 healthy individuals were included in the study. Single nucleotide polymorphism genotypes of candidate genes including two SNPs in the cytochrome P450 family 2 subfamily E member 1 (CYP2E1) gene (rs6413432, rs3813867), two SNPs in the glucokinase regulator (GCKR) gene (rs780094, rs1260326), rs738409 SNP in patatinlike phospholipase domain-containing 3 (PNPLA3), and gender parameters were used to develop models for identifying at-risk individuals. To predict the individual’s susceptibility to NASH, nine different machine-learning models were constructed. These models involved two different feature selections including Chi-square, and support vector machine recursive feature elimination (SVM-RFE) and three classification algorithms including k-nearest neighbor (KNN), multi-layer perceptron (MLP), and random forest (RF). All nine machine-learning models were trained using 80% of both the NASH patients and the healthy controls data. The nine machine-learning models were then tested on 20% of both groups. The model’s performance was compared for model accuracy, precision, sensitivity, and F measure. Results Among all nine machine-learning models, the KNN classifier with all features as input showed the highest performance with 86% F measure and 79% accuracy. Conclusions Machine learning based on genomic variety may be applicable for estimating an individual’s susceptibility for developing NASH among high-risk groups with a high degree of accuracy, precision, and sensitivity.

Volume

Issue

URI

https://hdl.handle.net/11363/6222

Collections

Web of Science ve Scopus Atıf Dizinlerindeki Yayınlar [54]

The following license files are associated with this item:

Creative Commons

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess