元强化学习综述

doi:10.16356/j.1005-2615.2021.05.001

首页 > 过刊浏览>2021年第53卷第5期 >653-663. DOI:10.16356/j.1005-2615.2021.05.001

元强化学习综述
DOI:
                        10.16356/j.1005-2615.2021.05.001
                    
作者:
                        
                        
                    
作者单位:1.南京航空航天大学计算机科学与技术学院/人工智能学院，南京 211106;2.模式分析与机器智能工业和信息化部重点实验室，南京 211106
作者简介:谭晓阳，男，教授，博士生导师，研究方向：深度强化学习，多智能体系统。
通讯作者:张哲，E-mail：zhangzhe@nuaa.edu.cn。
中图分类号:TP181
基金项目:国家自然科学基金（61976115，61732006）资助项目；全军共用信息系统装备预研基金（315025305）资助项目；南京航空航天大学“人工智能+”研究基金（NZ2020012，56XZA18009）资助项目。

Review on Meta Reinforcement Learning

Author:

Affiliation:

1.College of Computer Science and Technology/College of Artificial Intelligence， Nanjing University of Aeronautics & Astronautics， Nanjing 211106， China;2.MIIT Key Laboratory of Pattern Analysis and Machine Intelligence， Nanjing 211106， China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

元强化学习是指自动从一组相关任务中学习强化学习所需归纳偏置的相关理论和方法，对于提高强化学习算法在困难场景下的样本效率和泛化能力具有重要用途。本文提出一种新的元强化学习框架，指出设计和分析一个元强化学习算法需要同时考虑学习经验（相关任务）、归纳偏置及学习目标3个独立因素及这3个因素之间的依赖关系。在此基础上对该领域的研究现状进行了分析和总结，特别对近年来元强化学习若干文献进行了分析和归类，并详细阐述了几种代表性算法的原理及各自特点。本文还对元强化学习常用的实验环境和性能评价方法进行了介绍，对该领域的不足和未来的发展方向进行了讨论和分析。

Abstract:

Meta reinforcement learning （Meta-RL） aims at automatically learning induction bias for a new reinforcement learning task from a set of different but related tasks. It plays an important role in improving the sample efficiency and generalization of reinforcement learning algorithm in difficult scenarios. This paper first introduces a framework in which three key components of Meta-RL are identified， i.e.， learning experience （related tasks）， inductive bias and learning objective. Based on this， current research progress in this field is analyzed and reviewed， and the principles and characteristics of several representative algorithms are described. The paper also gives a detailed account of commonly used benchmark environments and performance evaluation methods for meta-RL. The limitation of current research and potential future development directions are also discussed.

参考文献

相似文献

引证文献

引用本文

谭晓阳,张哲.元强化学习综述[J].南京航空航天大学学报,2021,53(5):653-663

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2020-10-11
最后修改日期:2021-03-10
录用日期:
在线发布日期: 2021-11-02
出版日期:

引用本文

分享

文章指标

历史