结构与纯度结合的新型决策树分裂准则
作者:
作者单位:

1.南京航空航天大学计算机科学与技术学院,南京 211106;2.工信部模式分析和机器智能重点实验室,南京 211106

作者简介:

通讯作者:

陈松灿,男,教授,博士生导师,E-mail: s.chen@nuaa.edu.cn。

中图分类号:

TP181

基金项目:

国家自然科学基金(62076124)。


Novel Splitting Criteria for Decision Trees with Combination of Structure and Purity
Author:
Affiliation:

1.College of Computer Science and Technology, Nanjing University of Aeronautics & Astronautics, Nanjing 211106, China;2.MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing 211106, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    决策树(Desision tree, DT)生长关键步骤的分裂或分叉准则通常根据纯度和误分类误差等实现,分裂生长分为轴平行和非轴平行方式。这些分裂准则一般与数据内在结构(如类别是否是多簇或单簇组成)无关。为了弥补这一缺失,本文提出了两种混合分裂准则,分别用加权和两步法将同类内的节点间距(Between-node margin within the same class,BNM)和同一节点内的类紧性(Within-class compactness and between-class separation in the same inner node,CSN)与纯度度量相结合。由于传统决策树以贪婪方式生长,仅能确定出当前的一个局部最优分裂点,为改善这个缺点,本文首先根据纯度确定出前k个候选分裂点,然后通过最大化BNM和最小化CSN确定最终的分裂点,不仅缓和了纯度上的局部最优性,而且引入了数据结构的全局性,因此能较大程度地改进后代节点的分裂,增强树的泛化性和可解释性。将上述两种分裂准则组合还可以进一步提升性能。在21个标准验证数据集上的比较结果表明:新准则下的决策树不仅提高了预测性能、降低了复杂性,而且相比于其他采用混合分裂准则的DTs更具竞争力。

    Abstract:

    As a critical part of desision tree (DT) growth, its nodes can be split to grow by either axis- or non-axis-aligned way based on such splitting criteria as purity and misclassification error. However, these have nothing to do with the geometric structure of data, e.g. multicentric data or single-center data. In order to compensate for this, two splitting criteria are proposed by combining the between-class margin in the same inner node (BCM) and the between-node margin within the same class (BNM) respectively with the purity measure in weighting and the two-step method. Unlike traditional greedy growth of DT which only finds the current locally optimal splitting point, the proposed method first selects the top-k purity splitting nodes, then determines the optimal one by maximizing BCM and minimizing BNM. Since not only alleviating purity-based local optimality but also considering global structures of the data, our method greatly improves the division of descendant nodes and generalization of the formed trees, while enhancing the interpretability. In addition, two aforementioned splitting criteria can be combined to further boost the performance. The comparison results on 21 benchmark datasets show an improvement in predictive performance of new trees with reduction in complexity, while also are competitive with many other DTs using hybrid splitting criteria.

    参考文献
    相似文献
    引证文献
引用本文

杜斐,陈松灿.结构与纯度结合的新型决策树分裂准则[J].南京航空航天大学学报,2023,55(3):534-543

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-06-07
  • 最后修改日期:2022-10-10
  • 录用日期:
  • 在线发布日期: 2023-07-01
  • 出版日期:
您是第位访问者
网站版权 © 南京航空航天大学学报
技术支持:北京勤云科技发展有限公司