开放世界下带有分布内和分布外噪声的长尾学习
CSTR:
作者:
作者单位:

1.南京航空航天大学计算机科学与技术学院模式分析与机器智能工业和信息化部重点实验室,南京 211106;2.南京大学计算机软件新技术全国重点实验室,南京 210093;3.江苏省产品质量监督检验研究院,南京 210001;4.南京理工大学计算机科学与工程学院,南京 210094

作者简介:

通讯作者:

李绍园,女,副教授,E-mail:lisy@nuaa.edu.cn。

中图分类号:

TP311.5

基金项目:

中央高校基本科研业务费专项资金(NS2024059);国家自然科学基金(62376126)。


Long-Tailed Learning with In- and Out-of-Distribution Noisy Labels in the Open World
Author:
Affiliation:

1.MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics & Astronautics, Nanjing 211106, China;2.State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China;3.Jiangsu Product Quality Testing and Inspection Institute, Nanjing 210001, China;4.School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在训练深度神经网络时,实际应用数据常存在长尾类别分布、分布内噪声和分布外噪声等偏差。现有方法多单独解决类别不平衡或含噪声标签问题,很少同时考虑两者,尤其是两种噪声并存时。本文提出不平衡噪声标签校准(Imbalanced noisy label calibration, INLC)方法,用模型一致性预测筛选分布外样本并赋予均匀标签,增强模型对其检测能力;对分布内样本,利用 Jensen-Shannon 散度区分噪声,减少干净样本误分类,尤其针对尾部类别;引入额外语义分类器,缓解伪标签对多数类的偏向性以应对类别不平衡;采用基于强数据增强的一致性正则化方法提升模型泛化性能。在模拟和真实数据集上的实验表明,INLC 显著减轻了标签噪声和类别不平衡的影响,分类准确率较优异基线方法提高 2% 以上。

    Abstract:

    When training deep neural networks in practical application scenarios, the data used often have various biases, such as long-tailed category distributions, in-distribution noise, and out-of-distribution noise. Most existing methods focus on solving the problem of category imbalance or dealing with noisy labels, but rarely consider both aspects simultaneously, especially when in- and out-of-distribution noises exist at the same time. We propose an imbalanced noisy labels calibration (INLC) method to address this challenge. To handle out-of-distribution samples, we use the model’s consistent predictions to filter them out and assign uniform labels, thereby enhancing the model’s ability to detect out-of-distribution samples. For in-distribution samples, we use the Jensen-Shannon divergence to distinguish noise and reduce misclassification of clean samples, especially in tail categories. To address the problem of category imbalance, we introduce an additional semantic classifier to mitigate the bias of pseudo-labels towards majority categories. Finally, we adopt a consistency regularization method based on strong data augmentation to further improve the model’s generalization performance. We conducted extensive experiments on simulated and real-world datasets, covering different levels of category imbalance from low to high and different proportions of label noise. Experimental results show that INLC significantly alleviates the impact of label noise and category imbalance, and improves the classification accuracy by more than 2% compared with the previous state-of-the-art baseline methods.

    参考文献
    相似文献
    引证文献
引用本文

郑金鹏,李绍园,朱晓林,黄圣君,陈松灿,王康侃.开放世界下带有分布内和分布外噪声的长尾学习[J].南京航空航天大学学报,2025,57(5):842-851

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-01-30
  • 最后修改日期:2024-04-25
  • 录用日期:
  • 在线发布日期: 2025-10-27
  • 出版日期:
文章二维码
您是第位访问者
网站版权 © 南京航空航天大学学报
技术支持:北京勤云科技发展有限公司