论文部分内容阅读
现实应用中常常涉及许多连续的数值属性 ,而目前许多机器学习算法则要求所处理的属性取离散值 .根据在对数值属性的离散化过程中 ,是否考虑相关类别属性的值 ,离散化算法可分为有监督算法和无监督算法两类 .基于混合概率模型 ,该文提出了一种理论严格的无监督离散化算法 ,它能够在无先验知识、无类别属性的前提下 ,将数值属性的值域划分为若干子区间 ,再通过贝叶斯信息准则自动地寻求最佳的子区间数目和区间划分方法
Many real-time applications often involve many continuous numerical attributes, and many current machine learning algorithms require that the attributes to be processed take a discrete value. The discretization algorithm may be based on whether or not the value of the relevant category attribute is considered in the discretization of the numerical attribute There are two kinds of supervised algorithms and unsupervised algorithms.Based on the mixed probability model, this paper presents a rigorous theory of unsupervised discretization algorithm, which can under the premise of no prior knowledge, no class attribute, the numerical attributes Is divided into several sub-intervals, and then automatically find the best sub-interval number and interval division method through Bayesian information criterion