论文部分内容阅读
高效、准确地获取目标数据及其关联数据,是决定大数据共享与挖掘分析能否实现的关键因素。传统的数据检索方法无法利用地学数据间的显性或隐含关系,已不能满足日益增长的对检索结果质和量的需求,而本体理论和技术的语义检索成为当前的研究热点。本文针对时间这一地学数据的本质属性,在系统研究地学数据时间概念与特征的基础上,建立了地学数据时间本体模型,并深入论述了模型中的时间关系、时间坐标系等内容,提出了时间位置和时间距离的描述函数,同时研究了二者的本体表达方式。构建了包括地质年代等在内的地学数据时间本体库,并以语义网开发框架Jena为基础,经本体解析、元数据时间信息抽取与标注等过程,将时间本体应用于地球系统科学数据共享平台的元数据检索之中。结果表明,以时间本体的地学数据语义检索查全率约为关键字方法的1倍,检索结果排序,以及关联数据推荐方面也有更好的效果,为促进地学数据共享与关联发现提供了一种有效的方法。
Efficient and accurate access to target data and its associated data is the key factor that determines whether big data sharing and mining analysis can be achieved. Traditional data retrieval methods can not utilize the explicit or implicit relations between geoscientific data and can no longer meet the increasing demand for the quality and quantity of retrieval results. However, the semantic retrieval of ontology theory and technology has become the hotspot of current research. Based on the systematic study of the concept and characteristics of geo-temporal data, this paper establishes a time-ontology model of geo-historical data based on the essential attribute of geo-temporal data of time and elaborates the time and time coordinate systems of the geo-spatial data in detail. Time location and time distance of the description of the function, while studying the two ontology of expression. Based on the semantic web development framework Jena, the Ontology database of geoscience data, including geologic age, is constructed. Based on ontology analysis, metadata extraction and annotation of metadata, the ontology is applied to the Earth System Science Data Sharing Platform Among the metadata retrieval. The results show that the semantic retrieval rate of geo-ontology data based on time ontology is about one time of keyword method, ranking of search results and recommendation of related data are also better, which provides a method for geo-data sharing and association discovery effective method.