基于强化学习的多机协同超视距空战决策算法
CSTR:
作者:
作者单位:

1.南京航空航天大学自动化学院, 南京 211106;2.西北工业大学自动化学院, 西安 710072

作者简介:

通讯作者:

王志刚,男,研究员,E-mail:zgwang@nuaa.edu.cn。

中图分类号:

V249.1

基金项目:


Multi-aircraft Collaborative Beyond-Visual-Range Air Combat Decision-Making Algorithm Based on Reinforcement Learning
Author:
Affiliation:

1.Collage of Automation Engineering, Nanjing University of Aeronautics & Astronautics, Nanjing 211106, China;2.School of Automation, Northwestern Polytechnical University, Xi’an 710072, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    现代战争中的空战态势复杂多变,因此探索一种快速有效的决策方法十分重要。本文对多架无人机协同对抗问题展开研究,提出一种基于长短期记忆(Long and short-term memory, LSTM)和多智能体深度确定策略梯度(Multi-agent deep deterministic policy gradient,MADDPG)的多机协同超视距空战决策算法。首先,建立无人机运动模型、雷达探测区模型和导弹攻击区模型。然后,提出了多机协同超视距空战决策算法。设计了集中式训练LSTM-MADDPG分布式执行架构和协同空战系统的状态空间来处理多架无人机之间的同步决策问题;设计了学习率衰减机制来提升网络的收敛速度和稳定性;利用LSTM网络改进了网络结构,增强了网络对战术特征的提取能力;利用基于衰减因子的奖励函数机制加强无人机的协同对抗能力。仿真结果表明所提出的多机协同超视距空战决策算法使无人机具备了协同攻防的能力,同时算法具备良好的稳定性和收敛性。

    Abstract:

    As the modern air combat environment grows increasingly complex and dynamic, the need for rapid and effective decision-making methods has become urgent. This paper proposes a multi-aircraft cooperative beyond-visual-range air combat decision-making algorithm based on long and short-term memory (LSTM) and multi-agent deep deterministic policy gradient (MADDPG) to address the challenge of collaborative confrontation of multiple unmanned aerial vehicles (UAVs). First, a beyond-visual-range air combat environment is established, including the UAV movement model, the radar detection zone model, and the missile attack zone model. Second, the multi-aircraft collaborative beyond-visual-range air combat decision-making algorithm is proposed. This algorithm includes a centralized-training distributed-execution framework and a state space of the collaborative air combat system to handle synchronous decision-making across multiple UAVs, a learning rate decay mechanism to enhance network convergence speed and stability, an improved network based on LSTM to strengthen tactical feature extraction, and a decay-factor-based reward function to improve cooperative confrontation performance. Experimental results demonstrate that the proposed algorithm equips UAVs with effective collaborative attacking and defensive capabilities, while exhibiting strong stability and convergence.

    参考文献
    相似文献
    引证文献
引用本文

王志刚,龚华军,尹逸,刘小雄.基于强化学习的多机协同超视距空战决策算法[J].南京航空航天大学学报,2025,57(5):831-841

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-05-25
  • 最后修改日期:2024-11-26
  • 录用日期:
  • 在线发布日期: 2025-10-27
  • 出版日期:
文章二维码
您是第位访问者
网站版权 © 南京航空航天大学学报
技术支持:北京勤云科技发展有限公司