0% Complete
صفحه اصلی
/
4th international edition and 13th Iranian Conference on Bioinformatics
Machine Learning-Driven Discovery of JAK2 Inhibitors from ChEMBL Databank
نویسندگان :
Negar Abdolmaleki
1
Hamid Mahdiuni
2
1- دانشگاه رازی
2- دانشگاه رازی
کلمات کلیدی :
Janus Kinase 2 Inhibitors،Virtual Screening،Machine Learning،Random Forest،ECFP4 Fingerprints
چکیده :
This study aims to identify potent JAnus Kinase 2 (JAK2) inhibitors using machine learning models for virtual screening of the small molecule subset of the ChEMBL databank. JAK2 is a key player in the JAK-STAT signaling pathway, which regulates immune responses and inflammation. Hence, JAK2 inhibitors have significant potential for treating autoimmune disorders, inflammatory diseases, and certain cancers (Lv and Qi, 2024). The machine learning models utilized in this study included Support Vector Machine (SVM), XGBoost, and Random Forest (RF). For training the models, datasets of 6,847 active JAK2 inhibitors were sourced from ChEMBL (Gaulton and Hersey, 2017), BindingDB (Gilson and Liu, 2016), and PubChem (Kim and Chen, 2019), along with 6,500 inactive compounds from the DUD-E database (Mysinger and Carchia, 2012). Each compound was labeled with an activity status (1 for active and 0 for inactive). Molecular characteristics were represented using extended connectivity fingerprints (ECFP4) (Baptista and Correia, 2022), alongside molecular descriptors such as molecular weight, polar surface area (PSA), and logP. The datasets were processed using RDKit to extract ECFP4 fingerprints and additional descriptors. The performance of the models was evaluated using several metrics, including accuracy and area under the curve (AUC). The Random Forest model achieved the highest performance, with a testing set accuracy of 0.9970 and an AUC of 0.9996. The SVM model achieved an accuracy of 99.63% and an AUC of 99.93%, while the XGBoost model had an accuracy of 99.55% and an AUC of 99.85%. Therefore, according to the performance data, the Random Forest model was used for virtual screening on a large-scale compound database containing 1,930,555 molecules. The model identified 99,653 compounds as potential JAK2 inhibitors (active) and classified the remaining 1,830,902 as inactive. These findings demonstrate that combining ECFP4 fingerprints and molecular descriptors with the Random Forest model highlights the effectiveness of machine learning-driven virtual screening in accelerating drug discovery for JAK2 inhibitors.
لیست مقالات
لیست مقالات بایگانی شده
Identification of Antigenic Proteins of Acinetobacter baumannii as Potential Novel Vaccine Candidates Through a Reverse Vaccinology Approach
Amirhossein Ghadiri - Abbas Doosti - Mostafa Shakhsi-Niaei
Simultaneous overexpression of CD70 and downregulation of CD84 as a prognostic marker for glucocorticoid resistance in B cell Acute lymphoblastic leukemia
Mohammad Hossein Shakib Manesh - Soheila Rahgozar
In silico analysis of Maize WRKY transcription factors in response to drought and salt stress
Majid NorouzI - Sahar Shahgoli - Bahram Baghban Kohnehrouz
Structural and Biochemical Insights into Single-Stranded DNA-Binding Protein Complexes: A Comparative Study of DnaT, DnaBC, and Pab-RPA
Arshia Jahangiri - Maryam Azimzadeh Irani - Aida Arezoumandchafi
Development of Novel Cellulose Crystal-Hyaluronic Acid Anti-Cancer Carriers for Targeting
Yeganeh Abbasian Bajgiran - Maryam Azimzadeh Irani
Prediction of E8 mpox virus protein structure: a potential to design inhibitor
Mahsa Kazemi - Saeide Karimi - Maryam Kheirani nasab - Maryam Kazemi - Mahboobeh Nazari
S100a9 might act as a modulator of the Toll-like receptor 4 transduction pathway in chronic rhinosinusitis with nasal polyps
Nasibeh Khayer - Maryam Jalesssi
Exploring the Genes Located on Chromosome Y in Non-obstructive Azoospermia: A Bioinformatic Approach
Seyedeh Zahra Mousavi - Bahram Mohammad Soltani - Morteza Hadizadeh - Mehdi Totonchi
Identification of Driver Genes in Glioblastoma Based on Single-Cell Gene Expression Data Using Integrated Pseudotime and Phylogenetic Analysis
Fateme Mirza-Abolhassani - Sobhan Ahmadian Moghadam - Fatemeh Zare-Mirakabad - Kaveh Kavousi
Bioinformatic Approach to Predict the Regulatory Mechanisms: TF–miRNA–mRNA–lncRNA Network during Cluster Development in Grape
Shahla Sahraei - Nafiseh Mahdinezhad - Abbasali Emamjomeh - Kaveh Kavousi - Mahmood Solouki - Massimo Delledonne
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 40.4.1