0% Complete
صفحه اصلی
/
4th international edition and 13th Iranian Conference on Bioinformatics
Enhancing NAFLD Diagnosis with AI: Insights from the Persian Fasa Cohort Through Advanced Machine Learning Techniques
نویسندگان :
Marzie Shadpirouz
1
Mohammad Reza Zabihi
2
Zahra Salehi
3
Kiarash Zare
4
Mohammad Mehdi Naghizadeh
5
Kaveh Kavousi
6
1- University of Tehran
2- University of Tehran
3- Tehran University of Medical Sciences
4- Shiraz University of Medical Sciences
5- Fasa University of Medical Sciences
6- University of Tehran
کلمات کلیدی :
NAFLD،CNN،OWA،Artificial intelligence،Sugeno Fuzzy Integral
چکیده :
Non-alcoholic fatty liver disease (NAFLD) is a hepatic manifestation of metabolic syndrome, characterized by fat accumulation in the liver among individuals who do not consume excessive alcohol. Over the past three decades, its prevalence has risen globally, posing a significant public health challenge. NAFLD can progress to cirrhosis, liver failure, and an increased risk of cardiovascular disease, ultimately contributing to higher overall mortality. Despite its widespread occurrence, early detection remains a challenge due to limitations in current screening methods. Here, we aimed to develop an AI-driven model for diagnosing NAFLD based on blood parameters and anthropometric indices. This study utilized data from the Persian Fasa cohort, originally comprising 10,138 records and 226 features, categorized into discrete and continuous features. After preprocessing, normalization, and dimensionality reduction, statistical analyses were conducted using Python. Patients were categorized into three groups based on the Fatty Liver Index (FLI), including healthy (<30), borderline (30–60), and NAFLD (>60). The dataset was divided into training (70%) and testing (30%) subsets. Seven feature selection methods, including ANOVA, Mutual Information (MI), Independent Component Analysis (ICA), Non-negative Matrix Factorization (NMF), Principal Component Analysis (PCA), Penalized Support Vector Machine (SVM_L1), and Elastic Net Logistic Regression, were applied to extract common features. The Random Forest algorithm identified the most important extracted features, which were validated through Receiver Operating Characteristic (ROC) curve analysis. A variety of machine learning models, including Random Forest, Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbors (KNN), Decision Tree, CatBoost, AdaBoost, and XGBoost, were trained to evaluate classification performance using a 5-fold cross-validation approach. Model diversity was assessed using Kappa statistics and error analysis to ensure robustness. To further improve performance, Optimized Weighted Averaging (OWA) and Sugeno Fuzzy Integral methods were applied for model combination. Finally, a Convolutional Neural Network (CNN) was trained with 5-fold cross-validation to integrate robust models and enhance classification results. The final dataset comprised 70 clinical and lifestyle variables, including hypertension, smoking status, and others, collected from 10,007 patients (45.2% male and 54.8% female). The number of patients in each category was as follows: healthy (4,444), borderline (2,892), and NAFLD (2,671). Five key features, including BMI, waist-to-hip ratio, triglycerides, and GGT, were identified as the most significant predictors using the Random Forest method. The diagnostic value of these features was confirmed through ROC curve analysis, achieving an Area Under the Curve (AUC) greater than 0.7. SVM and CatBoost models demonstrated exceptional performance, with a Kappa score of 0.96 and an error rate of 0.01, indicating high model diversity and minimal error. Combining these two models using Sugeno Fuzzy Integral, OWA, and CNN-based meta-learning produced outstanding results: Accuracy 0.99, Precision 0.99, Recall 0.99, F1 Score 0.99, and an AUC of 1.00. By highlighting factors that could improve the diagnosis of NAFLD, we underscore the potential of AI in improving NAFLD diagnosis and provide valuable insights for early detection and intervention.
لیست مقالات
لیست مقالات بایگانی شده
Vaccine design for outer membrane protein C(Shigella Flexneri)
Maedeh Esmaili - Fatemeh Sefid
3D modeling of the spike protein in the Omicron variant of the coronavirus and comparison with the Delta and Wuhan strains
Ali Abolhasanzadeh Parizi
Insilico study of CD4+ T cells epitopes in ORF1ab protein of SARS-COV-2 for Iranian common MHCII alleles
Fatemeh Hajighasem - Atefeh Shirkavand
Homology modeling and molecular docking studies for discovering FlgK protein inhibitors; Helicobacter pylori flagellar subunit.
Vajiheh Eskandari
Bioinformatics studies on S35K mutation on Mnemiopsin 2 photoprotein
ََAmirReza Mohammadi - Vahab Jafarian - Fatemeh Khatami
Study population structure in Iranian Arab horse breed by principal component analysis (PCA) and discriminant analysis of principal components (DAPC) methods using genomic data
Behkam Teymori - Hossein Moradi sharbabak - Mohammad Moradi sharbabak - Mohammad Bagher Zandi - Alireza Fotuhi Siahpirani
A dynamic co-expression approach reveals Gins2 as a potential upstream modulator of HNSCC metastasis
Nasibeh Khayer - Samira Shabani - Maryam Jalessi - Mohammad Taghi Joghataei - Frouzandeh Mahjoubi
Element-Specific Estimation of Background Mutation Rates in Whole Cancer Genomes Through Transfer Learning
ّFarideh Bahari - Reza Ahangari Cohan - Hesam Montazeri
Identification of Therapeutic Biomarkers in Patients with Rheumatoid Arthritis Treated with Methotrexate
Narges Yaghoobi - Tannaz Araei - Sara Abedi - Mohsen Goharinia - Mohammad Mehdi Naghizadeh
Drug reproposing for brain cancer
Zahra Shahini - Farinaz Roshani
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.7.0