0% Complete
صفحه اصلی
/
4th international edition and 13th Iranian Conference on Bioinformatics
Enhancing NAFLD Diagnosis with AI: Insights from the Persian Fasa Cohort Through Advanced Machine Learning Techniques
نویسندگان :
Marzie Shadpirouz
1
Mohammad Reza Zabihi
2
Zahra Salehi
3
Kiarash Zare
4
Mohammad Mehdi Naghizadeh
5
Kaveh Kavousi
6
1- University of Tehran
2- University of Tehran
3- Tehran University of Medical Sciences
4- Shiraz University of Medical Sciences
5- Fasa University of Medical Sciences
6- University of Tehran
کلمات کلیدی :
NAFLD،CNN،OWA،Artificial intelligence،Sugeno Fuzzy Integral
چکیده :
Non-alcoholic fatty liver disease (NAFLD) is a hepatic manifestation of metabolic syndrome, characterized by fat accumulation in the liver among individuals who do not consume excessive alcohol. Over the past three decades, its prevalence has risen globally, posing a significant public health challenge. NAFLD can progress to cirrhosis, liver failure, and an increased risk of cardiovascular disease, ultimately contributing to higher overall mortality. Despite its widespread occurrence, early detection remains a challenge due to limitations in current screening methods. Here, we aimed to develop an AI-driven model for diagnosing NAFLD based on blood parameters and anthropometric indices. This study utilized data from the Persian Fasa cohort, originally comprising 10,138 records and 226 features, categorized into discrete and continuous features. After preprocessing, normalization, and dimensionality reduction, statistical analyses were conducted using Python. Patients were categorized into three groups based on the Fatty Liver Index (FLI), including healthy (<30), borderline (30–60), and NAFLD (>60). The dataset was divided into training (70%) and testing (30%) subsets. Seven feature selection methods, including ANOVA, Mutual Information (MI), Independent Component Analysis (ICA), Non-negative Matrix Factorization (NMF), Principal Component Analysis (PCA), Penalized Support Vector Machine (SVM_L1), and Elastic Net Logistic Regression, were applied to extract common features. The Random Forest algorithm identified the most important extracted features, which were validated through Receiver Operating Characteristic (ROC) curve analysis. A variety of machine learning models, including Random Forest, Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbors (KNN), Decision Tree, CatBoost, AdaBoost, and XGBoost, were trained to evaluate classification performance using a 5-fold cross-validation approach. Model diversity was assessed using Kappa statistics and error analysis to ensure robustness. To further improve performance, Optimized Weighted Averaging (OWA) and Sugeno Fuzzy Integral methods were applied for model combination. Finally, a Convolutional Neural Network (CNN) was trained with 5-fold cross-validation to integrate robust models and enhance classification results. The final dataset comprised 70 clinical and lifestyle variables, including hypertension, smoking status, and others, collected from 10,007 patients (45.2% male and 54.8% female). The number of patients in each category was as follows: healthy (4,444), borderline (2,892), and NAFLD (2,671). Five key features, including BMI, waist-to-hip ratio, triglycerides, and GGT, were identified as the most significant predictors using the Random Forest method. The diagnostic value of these features was confirmed through ROC curve analysis, achieving an Area Under the Curve (AUC) greater than 0.7. SVM and CatBoost models demonstrated exceptional performance, with a Kappa score of 0.96 and an error rate of 0.01, indicating high model diversity and minimal error. Combining these two models using Sugeno Fuzzy Integral, OWA, and CNN-based meta-learning produced outstanding results: Accuracy 0.99, Precision 0.99, Recall 0.99, F1 Score 0.99, and an AUC of 1.00. By highlighting factors that could improve the diagnosis of NAFLD, we underscore the potential of AI in improving NAFLD diagnosis and provide valuable insights for early detection and intervention.
لیست مقالات
لیست مقالات بایگانی شده
Revealing disease subtypes and heterogeneity in common variable immunodeficiency through transcriptomic analysis
Mohammad Reza Zabihi - Zahra Moradi - Nima Safari - Zahra Salehi - Kaveh Kavousi
An efficient method based on transformers for antimicrobial peptide prediction
Alireza Khorramfard - Jamshid Pirgazi - Ali Ghanbari Sorkhi
Discovery of effective markers in the severity of the disease in the genome of Iranian patients with covid-19 and introduction of an effective plant in controlling the severity of the disease
Fariba Esmaeili - Dariush Salimi
Identification of potent antiviral from the fungal metabolites against SARS COV-2 RdRp: An in silico study
Zohreh Sahhaf Razavi - Ali Ramazani - Armin Zarei
Exploring the Genes Located on Chromosome Y in Non-obstructive Azoospermia: A Bioinformatic Approach
Seyedeh Zahra Mousavi - Bahram Mohammad Soltani - Morteza Hadizadeh - Mehdi Totonchi
Uncovering Disrupted Cell-Cell Interactions in Alzheimer's Disease Using Variational Graph Autoencoders on Single-Cell Spatial Transcriptomics Data from the Human Middle Temporal Gyrus
Zahra Bayat - Alireza Fotuhi Siahpirani
Dissecting the genetic causes of inflammatory bowel disease based on whole exome sequencing
Amir Shahbazi - Mehdi Totonchi
Comprehensive Analysis of EEG Signals for Machine Learning-Based Depression Detection
Mikaeil Tabarraei - Sepideh Jabbari
Investigation of Potent Inhibitors to Control Bacillus anthracis by Targeting Its Anthrax Toxin: A Molecular Docking Study
Melika Sadat Samadi - Ghazal Shirdel - Amir Mohammad Akbarian khujin - Elnaz Afshari
3D-QSAR Modeling on 2-Pyrimidine Carbohydrazides as Utrophin Modulators for the Treatment of Duchenne Muscular Dystrophy by Combining CoMFA, CoMSIA, and Molecular Docking Studies
Reza Mahmoudzadeh Laki - Eslam Pourbasheer
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.7.0