论文部分内容阅读
随着Internet和光盘等大容量存储技术的迅速发展,如何迅速、有效地从大量信息中找到所需的信息已成为一个迫切需要解决的问题。文档自动分类系统就是将文档自动归到一个或几个文档类别中去,这一技术的发展将在信息检索、邮件分类、电子会议、信息过滤等许多环境中得到广泛应用。本次首次对传统的文档内容表示方法VSM(向量空间模型)作出改进,并针对系统特点提出了具体的分类算法。目前,本系统已初步实施完成并应用于江苏省“九五”重点攻关课题──“Internet数字图书馆”之中,取得了良好的效果。
With the rapid development of large-capacity storage technologies such as the Internet and optical disks, how to find the required information quickly and effectively from a large amount of information has become an urgent problem to be solved. Document automatic classification system is to automatically document into one or several document categories to go, the development of this technology will be in information retrieval, mail classification, e-conferencing, information filtering and many other environments are widely used. This time for the first time to improve the traditional method of document content representation VSM (vector space model), and for the system characteristics proposed a specific classification algorithm. At present, the system has been initially implemented and applied to the “Nine Five-Year Plan” key issue of Jiangsu Province ─ ─ “Internet Digital Library” and achieved good results.