论文部分内容阅读
Background: It is widely recognized that the molecular etiology of complex human diseases is very sophisticated, involving a large number of genes, gene-gene and geneenvironment interactions.Deciphering the underlying high order of multiple susceptible genes or genetic barcodes for complex diseases is always the great hope in biomedical domains, which has important implications in both early molecular diagnosis and personalized medicine.The aim of this study was to assess the potential of high-dimension SNP data (and environmental factors) to be used to generate the multi-factorial patterns for accurately partitioning human populations.Materials and Methods: Two datasets of case-control for two cancers were analyzed.The data for nasopharyngeal carcinoma (NPC) contained 676 SNPs and 12 environmental variables, while the data for coronary heart disease (CHD), provided by The Wellcome Trust Case Control Consortium (WTCCC), contained only genome-wide SNPs.To reduce the following computational burden, Chi-square association analysis of single SNP loci was first performed to reduce the SNP data.Then, genetic algorithm (GA) and probabilistic neural network (PNN) were integrated to be used to identify the multi-dimension patterns of SNPs (and environmental factors).ROC curve was also used to assess their performances for partitioning human populations.Finally, The pathway analysis software, Pathway Assist, was used to explore the biological functions that the genetic barcodes were involved.Results: For NPC data, a barcode composing of 14 SNPs (in 10 genes) and four environmental factors was identified, with an accuracy of 78.49% and Youden index 52.98% for distinguishing between NPC patients and health subjects.And for CHD data, 6 barcodes (with no more than 100 SNPs in each barcode) were identified, with accuracies of all >89% and AUC (area under the ROC curve) of all >0.85.The functional analysis of these barcodes demonstrated that these high-dimension barcodes were of sounding biological significance related to the two diseases.Conclusions: This study suggests that the proposed integrated approach is promising to be used for identifying gene-environment barcodes for complex human diseases .