基于云平台的层次聚类算法在煤炭产业中的应用

来源 :煤炭技术 | 被引量 : 0次 | 上传用户：sinoerli

【摘要】

：

层次聚类(Hierarchical Clustering)就是通过对数据集按照某种方法进行层次分解。该聚类方法可以设定聚类的个数,并得到了各个研究和应用领域的广泛应用。煤炭产业中往往希望

【作者】

：

张海建

【机构】

：

北京信息职业技术学院,

【出处】

：

煤炭技术

【发表日期】

：

2013年12期

【关键词】

：

层次聚类云平台煤炭产业大规模数据

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

层次聚类(Hierarchical Clustering)就是通过对数据集按照某种方法进行层次分解。该聚类方法可以设定聚类的个数,并得到了各个研究和应用领域的广泛应用。煤炭产业中往往希望对煤炭的产品进行聚类分析,从而有助于开发和生产。随着煤炭系统中收集的煤炭数据数量的增多,层次聚类算法由于需要计算大量的相似性矩阵需要大量的内存,原有的层次聚类算法不能有效地处理海量规模数据。文章针对煤炭数据中生成的大规模数据,提出基于云计算平台的分布式层次聚类算法,该算法能够分布式完成相似性矩阵的保存和计算,快速、准确地完成层次聚类工作。在实验部分通过2组实验证明了算法具有很高的效率以及很高的可扩展性。 Hierarchical Clustering (Hierarchical Clustering) is through the data set in accordance with some method of hierarchical decomposition. The clustering method can set the number of clusters and has been widely used in various fields of research and application. The coal industry often wants to cluster the coal products to help develop and produce. As the number of coal data collected in the coal system increases, the hierarchical clustering algorithm needs a large amount of memory due to the need to calculate a large number of similarity matrices, and the original hierarchical clustering algorithm can not effectively process the large-scale data. In this paper, a distributed hierarchical clustering algorithm based on cloud computing platform is proposed for large-scale data generated in coal data. This algorithm can accomplish the preservation and calculation of similarity matrix distributedly and quickly and accurately. Experimental results show that the algorithm has high efficiency and high scalability.

其他文献

奶业发展中的良种繁育工作

该文从挂篮荷载计算、施工流程、支座及临时固结施工、挂篮安装及试验、合拢段施工、模板制作安装、钢筋安装、混凝土的浇筑及养生、测量监控等方面人手,介绍了S226海滨大桥

期刊