视觉语义增强的联合小样本开集分类器
DOI:
CSTR:
作者:
作者单位:

1.南京航空航天大学计算机科学与技术学院;2.模式分析与机器智能工业和信息化部重点实验室

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:

江苏省自然科学青年基金BK20210292


Visual-Semantic Enhanced Joint Classifier for Few-Shot Open-Set Recognition
Author:
Affiliation:

Fund Project:

Natural Science Foundation of Jiangsu Provinceunder Grant BK20210292

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    本文探究了视觉-语言预训练模型CLIP在小样本开集识别(Few-shot open-set recognition, FSOR)任务中的潜力。实验发现:1)基于CLIP图像编码特征的视觉原型分类器通常不如传统FSOR基线方法;2)基于CLIP语义编码特征的语义原型分类器虽然在闭集分类上显著优于传统基线,但在开集识别方面表现不佳。本文分析造成这些问题的主要原因可能是CLIP的训练数据与FSOR目标数据之间的分布差异及CLIP语义原型分类器为已知类别划分了过大的决策边界。对此,提出了一种简单有效的视觉语义增强的联合小样本开集分类器,其不仅充分利用CLIP语义原型分类器的闭集分类优势,还巧妙挖掘了传统FSOR预训练模型构建的视觉原型分类器的潜力,以更紧密的决策边界进一步提升开集识别的精准度。在四个基准数据集上的实验结果表明,该方法在ACC和AUROC指标上相比最优基线平均提升了2.9%和2.6%。

    Abstract:

    This paper investigates the potential of the vision-language pretrained model CLIP in few-shot open-set recognition (FSOR). The experiments reveal that: (1) the visual-prototype classifier based on CLIP"s image encoding features generally performs worse than traditional FSOR baseline methods; (2) although the semantic-prototype classifier based on CLIP"s semantic encoding features significantly outperforms traditional baselines in closed-set performance, it underperforms in open-set performance. The primary reasons for these issues may be the gap between CLIP"s training data and the FSOR target data, as well as the overly large decision boundaries assigned by the CLIP semantic prototype classifier to known classes. To tackle these problems, a simple yet effective joint few-shot open-set classifier enhanced with visual-semantic integration is proposed, which not only leverages the closed-set classification advantages of the semantic-prototype classifier based on CLIP but also skillfully exploits the potential of the visual-prototype classifier constructed by traditional FSOR pretrained models, which further enhances the open-set performance by establishing tighter decision boundaries. Experiments on four benchmark datasets demonstrate that this method achieves average improvements of 2.9% in ACC and 2.6% in AUROC compared to the best traditional baselines.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-08-26
  • 最后修改日期:2024-11-25
  • 录用日期:2025-01-06
  • 在线发布日期:
  • 出版日期:
文章二维码
您是第位访问者
网站版权 © 南京航空航天大学学报
技术支持:北京勤云科技发展有限公司