论文部分内容阅读
As explored by biologists, there is a real and emerging need to identify co-regulated gene clusters, which includeboth positive and negative regulated gene clusters. However, the existing pattern-based and tendency-based clusteringapproaches are only designed for finding positive regulated gene clusters. In this paper, a new subspace clustering modelcalled g-Cluster is proposed for gene expression data. The proposed model has the following advantages: 1) find both positiveand negative co-regulated genes in a shot, 2) get away from the restriction of magnitude transformation relationship amongco-regulated genes, and 3) guarantee quality of clusters and significance of regulations using a novel similarity measurementgCode and a user-specified regulation threshold 5, respectively. No previous work measures up to the task which has been set.Moreover, MDL technique is introduced to avoid insignificant g-Clusters generated. A tree structure, namely GS-tree, is alsodesigned, and two algorithms combined with efficient pruning and optimization strategies to identify all qualified g-Clusters.Extensive experiments are conducted on real and synthetic datasets. The experimental results show that 1) the algorithmis able to find an amount of co-regulated gene clusters missed by previous models, which are potentially of high biologicalsignificance, and 2) the algorithms are effective and efficient, and outperform the existing approaches.