论文部分内容阅读
目的介绍分类树模型筛选恶性肿瘤危险因素基本原理、运算法则和应用价值。方法以浙江省嘉善县乳腺癌现场调查数据为例,采用Exhaustive CHAID法建立分类树模型对调查结果进行危险因素筛选,使用错分概率Risk值和ROC曲线下面积对模型进行评价。结果分类树模型从全部105个候选变量中筛选出9个危险因素,其中职业是最重要的影响因素,工人、教师及退休人员的乳腺癌发生概率显著高于其他人员。另外,模型显示经常参加体育锻炼在不同人群中对乳腺癌的影响效果有所不同。模型错分概率Risk值为0.174,利用预测概率绘制的ROC曲线下面积为0.872,与0.5比较具有显著的统计学意义,模型拟合效果很好。结论分类树模型不仅可以有效挖掘筛选出主要的影响因素,还可以对研究变量科学定义分界点,展示变量间复杂的相互作用,在流行病学研究中具有较高的应用价值。
Objective To introduce the basic principle, algorithm and application value of classification tree model in screening malignant tumor risk factors. Methods Taking the field survey data of breast cancer in Jiashan County of Zhejiang Province as an example, the classification tree model was established by using Exhaustive CHAID method to screen the risk factors. The risk of misclassification risk and the area under the ROC curve were used to evaluate the model. Results The classification tree model selected nine risk factors from all 105 candidate variables, of which occupation was the most important factor, and workers, teachers and retirees had significantly higher incidence of breast cancer than other individuals. In addition, the model shows that regular participation in physical exercise in different populations have different effects on breast cancer. The risk of model misclassification risk was 0.174. The area under the ROC curve using prediction probability was 0.872, which was significantly different from 0.5. The model fitting effect was very good. Conclusion Classification tree model not only effectively excavated and selected the main influential factors, but also scientifically defined the demarcation points of the research variables and demonstrated the complex interactions among the variables, which has high application value in epidemiological studies.