基于强化学习的双臂空间机器人应急姿态控制

doi:10.16356/j.1005-2615.2025.03.008

首页 > 过刊浏览>2025年第57卷第3期 >467-474. DOI:10.16356/j.1005-2615.2025.03.008

基于强化学习的双臂空间机器人应急姿态控制
DOI:
                        10.16356/j.1005-2615.2025.03.008
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:1.宇航空间机构全国重点实验室，上海 201108;2.上海宇航系统工程研究所，上海 201108
作者简介:
通讯作者:靳永强，男，研究员，E-mail：jinyong_qiang@126.com。
中图分类号:TP242
基金项目:国家自然科学基金委与中国航天科技集团公司联合基金（U21B6002）。

Attitude Control for Emergency Recovery Based on Reinforcement Learning Method for Dual-arm Space Robots

Author:

Affiliation:

1.National Key Laboratory of Aerospace Mechanism， Shanghai 201108， China;2.Aerospace System Engineering Shanghai， Shanghai 201108， China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对双臂空间在轨服务机器人在遭遇极端异常情况下飞轮、发动机故障等传统姿态控制失效问题，提出了一种基于强化学习的双臂空间机器人应急姿态控制算法。与传统姿态控制算法相比，本文仅通过飞行器所配置的两条机械臂进行有限的飞行器姿态恢复。通过搭建算法训练的物理环境，应用无模型的近端策略优化（Proximal policy optimization，PPO）算法进行姿态控制，结合在轨操作中机械臂运动学约束，设计奖励函数优化飞行器姿态控制精度。为验证上述策略有效性，在MuJoCo仿真环境中进行星体姿态恢复数值仿真，并针对不同星体质量、不同末端负载质量等工况进行算法适应性评估，结果表明该强化学习方法能满足飞行器进行有限姿态控制的需求，无需参数调节且具有一定鲁棒性。

Abstract:

Aiming at the traditional attitude control failure in on-orbit service dual-arm space robots under extreme abnormal conditions such as flywheel and engine malfunctions， an emergency attitude control algorithm for dual-arm space robots based on reinforcement learning is proposed. This approach achieves limited attitude recovery of the spacecraft using only the two robotic arms configured on the spacecraft which differs from traditional attitude control algorithms. A physical environment for algorithm training is constructed and a model-free proximal policy optimization （PPO） algorithm is used for attitude control. By incorporating the kinematic constraints of manipulators movements during on-orbit operations， the reward function is designed to optimize the precision of spacecraft attitude control. To validate the effectiveness of the proposed strategy， numerical simulations of the space robot attitude recovery are conducted in the MuJoCo environment. The adaptability of the algorithm is evaluated under various conditions， including various masses of the base， various masses of the end. Results demonstrate that the reinforcement learning method is suitable for spacecraft limited attitude control and show a certain robustness without the need of parameter fine-tuning.

参考文献

相似文献

引证文献

引用本文

黎丰,李宁,邹怀武,靳永强,张崇峰.基于强化学习的双臂空间机器人应急姿态控制[J].南京航空航天大学学报,2025,57(3):467-474

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-02-28
最后修改日期:2025-04-24
录用日期:
在线发布日期: 2025-06-20
出版日期:

引用本文

分享

相关视频

文章指标

历史

文章二维码