TY - JOUR
T1 - Genetic Links Between Common Lung Diseases and Lung Cancer Progression: Bioinformatics and Machine Learning Insights
AU - Hossain, Md Ali
AU - Asa, Tania Akter
AU - Mahmud, Md Zulfiker
AU - Azad, A. K.M.
AU - Rahman, Mohammad Zahidur
AU - Moni, Mohammad Ali
AU - Moustafa, Ahmed
N1 - Publisher Copyright:
© 2025 by the authors. Licensee ESJ, Italy.
PY - 2025/4
Y1 - 2025/4
N2 - Lung cancer (LC) is one of the most frequently diagnosed cancers and remains the leading cause of cancer-related mortality worldwide, representing a significant global health challenge. While numerous common lung diseases (CLDs) are implicated in LC development, the underlying causes of LC originating from CLDs remain inadequately elucidated. A thorough exploration of LC’s progression from CLDs is essential; our approach integrated bioinformatics and machine learning, utilizing data from GEO and TCGA databases. We began by identifying differentially expressed genes (DEGs) in LC and CLDs, and our gene-disease network revealed for the first time shared DEGs (LC shares significant genes with TB (36), asthma (10), pneumonia (17), COPD (18), and Idiopathic Pulmonary Fibrosis (IPF) (78)), providing insights into potential connections of LC with CLDs. This analysis not only broadened our understanding of their associations but also identified significant pathways and hub proteins (SPTBN1, KCNA4, SCN7A, KCNQ3, GRIA1, and SDC1) through a protein-protein interaction network (PPI). Furthermore, RNA-seq and clinical data were obtained from the cBioPortal portal for shared DEGs of LC and CLDs, assessing their impact on LC patient survival. Integrated mRNA-Seq and clinical data were analyzed via univariate and multivariate Cox Proportional Hazard models to elucidate the influence of significant genes on survival. Furthermore, we developed and deployed a predictive model leveraging the identified hub genes, which demonstrated high accuracy in predicting LC progression. The identified biomarkers and pathways hold promise for further translational research and potential therapeutic targets, advancing understanding of LC development from CLDs. Additionally, co-expression networks among common genes were explored using the Weighted Gene Co-expression Network Analysis (WGCNA). Finally, the hub genes were validated using the Human Protein Atlas (HPA) database and evaluated through various classification algorithms to ascertain their predictive power and diagnostic potential.
AB - Lung cancer (LC) is one of the most frequently diagnosed cancers and remains the leading cause of cancer-related mortality worldwide, representing a significant global health challenge. While numerous common lung diseases (CLDs) are implicated in LC development, the underlying causes of LC originating from CLDs remain inadequately elucidated. A thorough exploration of LC’s progression from CLDs is essential; our approach integrated bioinformatics and machine learning, utilizing data from GEO and TCGA databases. We began by identifying differentially expressed genes (DEGs) in LC and CLDs, and our gene-disease network revealed for the first time shared DEGs (LC shares significant genes with TB (36), asthma (10), pneumonia (17), COPD (18), and Idiopathic Pulmonary Fibrosis (IPF) (78)), providing insights into potential connections of LC with CLDs. This analysis not only broadened our understanding of their associations but also identified significant pathways and hub proteins (SPTBN1, KCNA4, SCN7A, KCNQ3, GRIA1, and SDC1) through a protein-protein interaction network (PPI). Furthermore, RNA-seq and clinical data were obtained from the cBioPortal portal for shared DEGs of LC and CLDs, assessing their impact on LC patient survival. Integrated mRNA-Seq and clinical data were analyzed via univariate and multivariate Cox Proportional Hazard models to elucidate the influence of significant genes on survival. Furthermore, we developed and deployed a predictive model leveraging the identified hub genes, which demonstrated high accuracy in predicting LC progression. The identified biomarkers and pathways hold promise for further translational research and potential therapeutic targets, advancing understanding of LC development from CLDs. Additionally, co-expression networks among common genes were explored using the Weighted Gene Co-expression Network Analysis (WGCNA). Finally, the hub genes were validated using the Human Protein Atlas (HPA) database and evaluated through various classification algorithms to ascertain their predictive power and diagnostic potential.
UR - http://www.scopus.com/inward/record.url?scp=105005991497&partnerID=8YFLogxK
U2 - 10.28991/ESJ-2025-09-02-021
DO - 10.28991/ESJ-2025-09-02-021
M3 - Article
AN - SCOPUS:105005991497
SN - 2610-9182
VL - 9
SP - 916
EP - 937
JO - Emerging Science Journal
JF - Emerging Science Journal
IS - 2
ER -