杨晓慧,白欣宇,乔江华.基于一种集成的信息基因选择方法的乳腺肿瘤识别研究[J].中国肿瘤,2019,28(7):557-562.
基于一种集成的信息基因选择方法的乳腺肿瘤识别研究
Identification of Breast Tumor Based on Integrated Information Gene Selection Method
投稿时间:2019-01-21  
DOI:10.11735/j.issn.1004-0242.2019.07.A014
中文关键词:  乳腺癌  微阵列基因表达数据  加权基因共表达网络  决策信息因子  反空间稀疏表示
英文关键词:breast cancer  microarray gene expression data  weighted gene co-expression network  decision information factor  inverse space sparse representation
基金项目:
作者单位
杨晓慧 河南大学数据分析技术实验室 
白欣宇 河南大学数据分析技术实验室 
乔江华 河南省肿瘤医院/郑州大学附属肿瘤医院 
摘要点击次数: 1780
全文下载次数: 353
中文摘要:
      摘 要:[目的] 探讨导致乳腺癌的可能致病基因及其生物学意义。[方法] 基于国际上通用的乳腺癌公共测试集Breast-2 (79) 数据库,提出了一种集成的决策信息因子(decision information factor,DIF)方法,以有效地选择出候选致病基因,并完成乳腺癌识别。基于R语言对原始基因数据做加权共表达网络分析以识别网络中的重要基因模块;使用DAVID软件对重要基因模块进行Pathway富集分析,验证是否具有统计学意义;使用DIF方法从具有统计学意义的重要基因模块中选择出2个候选致病基因;借助反空间稀疏表示分类模型完成乳腺癌识别。[结果] 通过加权基因共表达网络得到3个基因模块,其中2个经Pathway富集分析检验具有统计学意义,在这两个模块上采用DIF基因选择方法选出的2个候选致病基因用于乳腺癌识别时,准确率达到71.07%,比信噪比(signal noise ratio,SNR)、受试者工作特征曲线(receiver operating characteristic curve,ROC)、组内与组间平方和比率(the ratio of between-groups to within-groups sum of squares,BW)的方法分别高出13.93%、11.19%和8.57%。[结论] 该文提出的集成DIF基因选择方法得到的候选致病基因能有效识别乳腺癌,并具有明确的生物学意义。
英文摘要:
      Abstract:[Purpose] To explore the possible pathogenic genes and their biological significance in breast cancer. [Methods] Based on the standard public Breast-2(79) database,an integrated decision information factor(DIF) approach was proposed to select candidate pathogenic genes for identification of breast cancer. Firstly,based on the R language,the original gene data were analyzed by a weighted co-expression network analysis to select some important gene modules. Secondly,the pathway enrichment analysis was performed on these important gene modules using DAVID software to verify whether the genes were statistically significant. Thirdly,two candidate pathogenic genes were selected from the gene modules via the DIF. Finally,an inverse space sparse representation based classification was introduced to fulfill the breast tumor classification. [Results] Three gene modules were obtained by the weighted gene co-expression network,and two of them had statistically significant by pathway enrichment analysis. Two candidate pathogenic genes were selected by the integrated DIF gene selection method. Experiments showed that the classification accuracy reached 71.07%,which was higher than that of signal noise ratio(SNR,13.93%),receiver operating characteristic curve(ROC,11.19%),or the ratio of between-groups to within-groups sum of squares(BW,8.57%),respectively. [Conclusion] The two candidate genes selected by the integrated DIF gene selection method can be effectively used for identification of breast cancer.
在线阅读   查看全文  查看/发表评论  下载PDF阅读器