0% Complete
صفحه اصلی
/
4th international edition and 13th Iranian Conference on Bioinformatics
Enhancing NAFLD Diagnosis with AI: Insights from the Persian Fasa Cohort Through Advanced Machine Learning Techniques
نویسندگان :
Marzie Shadpirouz
1
Mohammad Reza Zabihi
2
Zahra Salehi
3
Kiarash Zare
4
Mohammad Mehdi Naghizadeh
5
Kaveh Kavousi
6
1- University of Tehran
2- University of Tehran
3- Tehran University of Medical Sciences
4- Shiraz University of Medical Sciences
5- Fasa University of Medical Sciences
6- University of Tehran
کلمات کلیدی :
NAFLD،CNN،OWA،Artificial intelligence،Sugeno Fuzzy Integral
چکیده :
Non-alcoholic fatty liver disease (NAFLD) is a hepatic manifestation of metabolic syndrome, characterized by fat accumulation in the liver among individuals who do not consume excessive alcohol. Over the past three decades, its prevalence has risen globally, posing a significant public health challenge. NAFLD can progress to cirrhosis, liver failure, and an increased risk of cardiovascular disease, ultimately contributing to higher overall mortality. Despite its widespread occurrence, early detection remains a challenge due to limitations in current screening methods. Here, we aimed to develop an AI-driven model for diagnosing NAFLD based on blood parameters and anthropometric indices. This study utilized data from the Persian Fasa cohort, originally comprising 10,138 records and 226 features, categorized into discrete and continuous features. After preprocessing, normalization, and dimensionality reduction, statistical analyses were conducted using Python. Patients were categorized into three groups based on the Fatty Liver Index (FLI), including healthy (<30), borderline (30–60), and NAFLD (>60). The dataset was divided into training (70%) and testing (30%) subsets. Seven feature selection methods, including ANOVA, Mutual Information (MI), Independent Component Analysis (ICA), Non-negative Matrix Factorization (NMF), Principal Component Analysis (PCA), Penalized Support Vector Machine (SVM_L1), and Elastic Net Logistic Regression, were applied to extract common features. The Random Forest algorithm identified the most important extracted features, which were validated through Receiver Operating Characteristic (ROC) curve analysis. A variety of machine learning models, including Random Forest, Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbors (KNN), Decision Tree, CatBoost, AdaBoost, and XGBoost, were trained to evaluate classification performance using a 5-fold cross-validation approach. Model diversity was assessed using Kappa statistics and error analysis to ensure robustness. To further improve performance, Optimized Weighted Averaging (OWA) and Sugeno Fuzzy Integral methods were applied for model combination. Finally, a Convolutional Neural Network (CNN) was trained with 5-fold cross-validation to integrate robust models and enhance classification results. The final dataset comprised 70 clinical and lifestyle variables, including hypertension, smoking status, and others, collected from 10,007 patients (45.2% male and 54.8% female). The number of patients in each category was as follows: healthy (4,444), borderline (2,892), and NAFLD (2,671). Five key features, including BMI, waist-to-hip ratio, triglycerides, and GGT, were identified as the most significant predictors using the Random Forest method. The diagnostic value of these features was confirmed through ROC curve analysis, achieving an Area Under the Curve (AUC) greater than 0.7. SVM and CatBoost models demonstrated exceptional performance, with a Kappa score of 0.96 and an error rate of 0.01, indicating high model diversity and minimal error. Combining these two models using Sugeno Fuzzy Integral, OWA, and CNN-based meta-learning produced outstanding results: Accuracy 0.99, Precision 0.99, Recall 0.99, F1 Score 0.99, and an AUC of 1.00. By highlighting factors that could improve the diagnosis of NAFLD, we underscore the potential of AI in improving NAFLD diagnosis and provide valuable insights for early detection and intervention.
لیست مقالات
لیست مقالات بایگانی شده
Single-Cell Transcriptomic Analysis Reveals Cellular Heterogeneity and Molecular Markers in Acute Leukemia Subtypes
Fatemeh Mohagheghian - Zahra Salehi - Najmeh Salehi
Investigating and identifying the expression profile of genes involved in human Lung Cancer using transcriptome data
Fatemeh Amiri - , Mostafa Rafiepour - Reza Mahdian - Farinaz Behfarjam
A Clustering-Based Method for Preserving Manifold Structure in EEG Signals Classification
Shermin Shahbazi - Majid Ramezani
In Silico Design and Evaluation of a Multi-Epitope Vaccine Candidate Against Escherichia coli and Staphylococcus aureus Involved in Bovine Clinical Mastitis
Aryan Ghorbani - Negin Khalili-samani - Maryam Amirinia - Faezeh Jazayeri-soreshjani - Faranak Ravanan - Mohammad Oveysi-rastabi - Abbas Doosti
Investigation of antiviral potency of fungal metabolites against Hepatitis C NS5B
Zohreh Sahhaf Razavi - Ali Ramazani - Armin Zarei
Predicting Adverse Drug Reactions with Advanced Machine Learning Techniques
AlI Mohammadian - Sara Haghighi Bardine - Fatemezahra Alizade
3D-QSAR Modeling on 2-Pyrimidine Carbohydrazides as Utrophin Modulators for the Treatment of Duchenne Muscular Dystrophy by Combining CoMFA, CoMSIA, and Molecular Docking Studies
Reza Mahmoudzadeh Laki - Eslam Pourbasheer
Comprehensive Analysis of EEG Signals for Machine Learning-Based Depression Detection
Mikaeil Tabarraei - Sepideh Jabbari
Epitope-Based Design of a Dual-Purpose Recombinant Protein Targeting Dengue NS1 for Vaccine and Diagnostic Development
Abolhassan Bahari - Amirmahdi Yavari
Bioinformatics investigation of the structure and function of photoprotein mnemiopsin2 following Glutamine 23 substitutions using a site-directed mutagenesis
Zahra Karimi Takaromi - Vahab Jafarian - Amir Dehghani - Khosrow Khalifeh - Fateme Khatami
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 40.4.1