基于机器学习开发鼻咽癌失巢凋亡相关诊断模型及验证的研究
Development and validation of diagnostic model related to anoikis in nasopharyngeal carcinoma based on machine learning
投稿时间:2024-01-20  修订日期:2024-03-12
DOI:
中文关键词:  机器学习  失巢凋亡  鼻咽癌  CHEK2  诊断模型
英文关键词:Machine learning  Anoikis  Nasopharyngeal carcinoma  CHEK2  Diagnostic model
基金项目:
作者单位邮编
郑响宁 南京医科大学附属肿瘤医院 210009
宗丹 南京医科大学附属肿瘤医院 
葛宜枝 南京医科大学附属肿瘤医院 
何侠* 江苏省肿瘤医院 210009
摘要点击次数: 34
全文下载次数: 0
中文摘要:
      [目的] 基于机器学习的方法开发并验证鼻咽癌失巢凋亡相关基因诊断模型,为鼻咽癌早期诊断提供新途径。[方法] GEO公共数据库获取鼻咽癌患者与正常对照组的转录组RNA表达谱数据。GSE12452和GSE61218被确定为训练集,而GSE64634和GSE118719被确定为验证集。使用R软件包“sva” 中“combat”方法合并数据并去除批次效应。Lasso回归筛选关键失巢凋亡基因,“limma”包筛选肿瘤与正常样本失巢凋亡差异基因。根据失巢凋亡差异基因进行鼻咽癌亚组分型并进行WGCNA分析,筛选出各自的特征基因取交集。Wilcoxon秩和检验确定两个亚组之间免疫细胞分布的差异。基于WGCNA筛选出来的基因集,运用RF、SVM、XGB和GLM四种机器学习方式筛选与发病相关的最重要基因,并用验证数据集进行模型验证。[结果] 在训练集中共筛选出18个具有显著差异的失巢凋亡相关基因,主要富集在失巢凋亡(anoikis)、树突棘(dendritic spine)和丝氨酸/苏氨酸激酶活动通路中。Lasso回归分析确认12个基因与免疫相关细胞大部分呈负相关,其中AUC值最高的为CHEK2基因。细胞实验证实CHEK2表达与EMT恶性表型相关。鼻咽癌失巢凋亡相关基因亚组之间免疫景观差异明显。WGCNA分析得出123个交集基因,四个模型中基于SVM-RF模型机器学习模型筛选出3个重要生物标志物(FHL2, ITGBL1和VIPR1),用来构建鼻咽癌诊断Nomo模型,诊断模型的AUC值为0.8,对鼻咽癌诊断具有较高的准确性。[结论] 失巢凋亡相关基因在鼻咽癌发展中发挥重要作用。通过机器学习算法构建了鼻咽癌诊断模型可有效提升诊断效能,为鼻咽癌诊断提供新的思路。
英文摘要:
      [Objective] Developed and validated of diagnostic model related to anoikis in nasopharyngeal carcinoma (NPC) related to apoptosis, providing a new way for the early diagnosis based on machine learning. [Methods] The transcriptomic profiles of nasopharyngeal carcinoma and normal control were obtained from GEO public database. GSE12452 and GSE61218 were identified as training sets, while GSE64634 and GSE118719 were identified as validation sets. Sva limma screened key anoikis genes and that of differential genes between tumor and normal samples. NPC subgroups were analyzed by WGCNA according to the differential anoikis genes, and their characteristic genes were selected for intersection. Wilcoxon determined differences in immune cell distribution between the two subgroups. Based on the gene sets screened by WGCNA, four machine learning methods, RF, SVM, XGB and GLM, were used to screen the most important genes related to morbidity, and the model was verified by validation data set. [Results] A total of 18 genes with significant differences were screened in the training set, mainly concentrated in anoikis, dendritic spine and serine/threonine kinase activity pathways. Lasso regression analysis confirmed that most of the 12 genes were negatively correlated with immune-related cells, and CHEK2 gene had the highest AUC value. The expression of CHEK2 was associated with the malignant phenotype of EMT. The immune landscape of nasopharyngeal carcinoma was significantly different among subgroups of genes related to anoikis apoptosis. Among the four models, three important biomarkers (FHL2, ITGBL1 and VIPR1) were selected based on the SVM-RF model and machine learning model to construct the Nomo model for nasopharyngeal cancer diagnosis. The AUC value of the diagnostic model was 0.8, which had a high accuracy for the diagnosis of nasopharyngeal cancer. [Conclusion] The anoikis genes play an important role in the development of NPC. The NPC diagnosis model constructed by machine learning algorithm can effectively improve the diagnostic efficiency and provide a new way for the diagnosis of nasopharyngeal cancer.
在线阅读     查看/发表评论  下载PDF阅读器