【综述】【博弈论视角下的多智能体强化学习综述】|电子爱好者

admin管理员组
文章数量:1615372

An Overview of Multi-agent Reinforcement Learning from Game Theoretical Perspective

博弈论视角下的多智能体强化学习综述

Abstract:

摘要：

1 Introduction

1 引言

1.1 A Short History of RL

1.1 RL简史

1.2 2019: A Booming Year for MARL

1.2 2019：MARL蓬勃发展的一年

2 Single-Agent RL

2 单智能体RL

2.2 Justification of Reward Maximisation

2.2 奖励最大化的合理性

2.3 Solving Markov Decision Processes

2.3 求解马尔可夫决策过程

2.3.1 Value-Based Methods

2.3.1 基于价值的方法

2.3.2 Policy-Based Methods

2.3.2 基于策略的方法

3 Multi-Agent RL

3 多智能体RL

3.1 Problem Formulation: Stochastic Game

3.1 问题表述：随机博弈

3.2 Solving Stochastic Games

3.2 解决随机游戏

3.2.1 Value-Based MARL Methods

3.2.1 基于价值的MARL方法

3.2.2 Policy-Based MARL Methods

3.2.2 基于策略的 MARL 方法

3.2.3 Solution Concept of the Nash Equilibrium

3.2.3 纳什均衡的解概念

An Overview of Multi-agent Reinforcement Learning from Game Theoretical Perspective

博弈论视角下的多智能体强化学习综述

https://arxiv/abs/2011.00583

Abstract:

摘要：

Following the remarkable success of the AlphaGO series, 2019 was a booming year that witnessed significant advances in multi-agent reinforcement learning (MARL) techniques. MARL corresponds to the learning problem in a multi-agent system in which multiple agents learn simultaneously. It is an interdisciplinary domain with a long history that includes game theory, machine learning, stochastic control, psychology, and optimisation. Although MARL has achieved considerable empirical success in solving real-world games, there is a lack of a self-contained overview in the literature that elaborates the game theoretical foundations of modern MARL methods and summarises the recent advances. In fact, the majority of existing surveys are outdated and do not fully cover the recent developments since 2010. In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments in the research frontier.
继AlphaGO系列取得巨大成功之后，2019年是蓬勃发展的一年，见证了多智能体强化学习（MARL）技术的重大进步。MARL对应于多智能体系统中的学习问题，其中多个智能体同时学习。这是一个具有悠久历史的跨学科领域，包括博弈论、机器学习、随机控制、心理学和优化。尽管 MARL 在解决现实世界博弈方面取得了相当大的实证成功，但文献中缺乏一个独立的概述来阐述现代 MARL 方法的博弈论基础并总结最近的进展。事实上，大多数现有调查已经过时，无法完全涵盖自2010年以来的最新发展。在这项工作中，我们提供了一本关于MARL的专著，涵盖了研究前沿的基础知识和最新发展。

Our work is separated into two parts. From §1 to §4, we present the self-contained fundamental knowledge of MARL, including problem formulations, basic solutions, and existing challenges. Specifically, we present the MARL formulations through two representative frameworks, namely, stochastic games and extensive-form games, along with different variations of games that can be addressed. The goal of this part is to enable the readers, even those with minimal related background, to grasp the key ideas in MARL research. From §5 to §9, we present an overview of recent developments of MARL algorithms. Starting from new taxonomies for MARL methods, we conduct a survey of previous survey papers. In later sections, we highlight several modern topics in MARL research, including Q-function factorisation, multi-agent soft learning, networked multi-agent MDP, stochastic potential games, zero-sum continuous games, online MDP, turn-based stochastic games, policy space response oracle, approximation methods in general-sum games, and mean-field type learning in games with infinite agents. Within each topic, we select both the most fundamental and cutting-edge algorithms.
我们的工作分为两部分。从 § 1 到 § 4，我们介绍了 MARL 的独立基础知识，包括问题表述、基本解决方案和现有挑战。具体来说，我们通过两个具有代表性的框架，即随机博弈和广义博弈，以及可以解决的不同博弈变体来介绍MARL公式。这部分的目的是使读者，即使是那些相关背景很少的读者，也能掌握 MARL 研究的关键思想。从 § 5 到 § 9，我们概述了 MARL 算法的最新发展。从MARL方法的新分类法开始，我们对以前的调查论文进行了调查。在后面的章节中，我们将重点介绍MARL研究中的几个现代主题，包括Q函数分解、多智能体软学习、网络多智能体MDP、随机势博弈、零和连续博弈、在线MDP、回合制随机博弈、策略空间响应预言机、广义博弈中的近似方法和无限智能体博弈中的均值场类型学习。在每个主题中，我们都会选择最基本和最前沿的算法。

The goal of our monograph is to provide a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective. We expect this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.
我们的专著的目标是从博弈论的角度对当前最先进的 MARL 技术进行独立的评估。我们希望这项工作能够成为即将进入这个快速增长领域的新研究人员和希望获得全景视图并根据最新进展确定新方向的现有领域专家的垫脚石。

1 Introduction

1 引言

Machine learning can be considered as the process of converting data into knowledge (Shalev-Shwartz and Ben-David,, 2014). The input of a learning algorithm is training data (for example, images containing cats), and the output is some knowledge (for example, rules about how to detect cats in an image).

This knowledge is usually represented as a computer program that can perform certain task(s) (for example, an automatic cat detector). In the past decade, considerable progress has been made by means of a special kind of machine learning technique: deep learning (LeCun et al.,, 2015). One of the critical embodiments of deep learning is different kinds of deep neural networks (DNNs) (Schmidhuber,, 2015) that can find disentangled representations (Bengio,, 2009) in high-dimensional data, which allows the software to train itself to perform new tasks rather than merely relying on the programmer for designing hand-crafted rules. An uncountable number of breakthroughs in real-world AI applications have been achieved through the usage of DNNs, with the domains of computer vision (Krizhevsky et al.,, 2012) and natural language processing (Brown et al.,, 2020; Devlin et al.,, 2018) being the greatest beneficiaries.
机器学习可以被认为是将数据转化为知识的过程（Shalev-Shwartz and Ben-David，2014）。学习算法的输入是训练数据（例如，包含猫的图像），输出是一些知识（例如，有关如何在图像中检测猫的规则）。

这些知识通常表示为可以执行某些任务的计算机程序（例如，自动猫检测器）。在过去的十年中，通过一种特殊的机器学习技术取得了相当大的进展：深度学习（LeCun et al.，， 2015）。深度学习的关键实施例之一是不同类型的深度神经网络（DNN）（Schmidhuber，，2015），它可以在高维数据中找到解纠缠的表示（Bengio，，2009），这使得软件能够训练自己执行新任务，而不仅仅是依靠程序员来设计手工制作的规则。通过使用 DNN，在现实世界的 AI 应用中取得了无数的突破，包括计算机视觉（Krizhevsky 等人，2012 年）和自然语言处理（Brown 等人，2020 年;Devlin et al.，， 2018）是最大的受益者。

In addition to feature recognition from existing data, modern AI applications often require computer programs to make decisions based on acquired knowledge (see Figure 1). To illustrate the key components of decision making, let us consider the real-world example of controlling a car to drive safely through an intersection. At each time step, a robot car can move by steering, accelerating and braking. The goal is to safely exit the intersection and reach the destination (with possible decisions of going straight or turning left/right into another lane). Therefore, in addition to being able to detect objects, such as traffic lights, lane markings, and other cars (by converting data to knowledge), we aim to find a steering policy that can control the car to make a sequence of manoeuvres to achieve the goal (making decisions based on the knowledge gained). In a decision-making setting such as this, two additional challenges arise:
除了从现有数据中识别特征外，现代人工智能应用通常还需要计算机程序根据获得的知识做出决策（见图1）。为了说明决策的关键组成部分，让我们考虑控制汽车安全通过十字路口的真实示例。在每个时间步长，机器人汽车都可以通过转向、加速和制动来移动。目标是安全地离开十字路口并到达目的地（可能决定直行或左/右转进入另一条车道）。因此，除了能够检测交通信号灯、车道标记和其他汽车（通过将数据转换为知识）等物体外，我们还旨在找到一种可以控制汽车进行一系列机动以实现目标的转向策略（根据获得的知识做出决策）。在这样的决策环境中，还出现了两个额外的挑战：

Figure 1:Modern AI applications are being transformed from pure feature recognition (for example, detecting a cat in an image) to decision making (driving through a traffic intersection safely), where interaction among multiple agents inevitably occurs. As a result, each agent has to behave strategically. Furthermore, the problem becomes more challenging because current

本文标签：视角智能博弈论

版权声明：本文标题：【综述】【博弈论视角下的多智能体强化学习综述】内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://www.elefans.com/xitong/1728688141a1169714.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

【综述】【博弈论视角下的多智能体强化学习综述】

An Overview of Multi-agent Reinforcement Learning from Game Theoretical Perspective

博弈论视角下的多智能体强化学习综述

Abstract:

摘要：

1 Introduction

1 引言

更多相关文章

全新视角！带你一文读懂ChatGPT！

智能家居无线组网遥控电子智能锁

pdf编辑器Acrobat Pro DC 2019 for Mac——创建和编辑最智能的PDF！

智能默认值：关于库和框架

「智能新能源」时代，谁才是真正的硬核技术玩家？

函数周期表丨时间智能丨表丨SAMEPERIODLASTYEAR（修订）

听见丨 锤子明年将有更多智能硬件还有T3

物联网毕设 -- 智能拐杖（APP+GPS）

【综述】【多智能体强化学习：理论和算法的选择性概述】

二极管反向恢复时间电脑程控测试系统（智能识别示波器曲线）

两个变量组合判空，idea智能提示Condition ‘b == null‘ is always ‘true‘ when reached

AI智能办公实战108招：ChatGPT+Word+PowerPoint+WPS

ChatGPT与传统搜索引擎的区别：智能对话与关键词匹配的差异

【Java学习】IDEA好用小插件 smart input 智能切换中英文输入法

飞入百姓家的智能路由器

用计算机弹下课铃声,智能广播打铃系统(校园广播上下课铃声)V7.2.1 免费版

ESP8266开发之旅 阿里云物联网平台篇② MQTT.FX客户端模拟 调试 MQTT LED智能灯控制系统

Ego-Exo 4D：从第一人称和第三人称视角理解熟练的人类活动

QT+讯飞智能语音在线识别demo，录音识别

探秘reMarkable-tools：为智能纸板打造的创新协作工具

发表评论

推荐文章

Hadoop 的三种调度器FIFO、Capacity Scheduler、Fair Scheduler

关于c++ vector capacity、max_size、size、sizeof的区别

currentThread()方法的作用

Wallpaper Engine免费版，Wallpaper壁纸引擎，Wallpaper Engine离线版更新至2.2.18版本，23年2月新版，解决导入壁纸时提示需要更新的问题，附一套全新壁纸

安装并解决Vue-devtools调试工具在浏览器不亮

热门文章

固态硬盘开卡软件_【移动的家】惠普P500 移动硬盘上手体验

Java-字符容量-capacity()方法:

大数据之-Hadoop3.x_Yarn_容量调度器队列案例---大数据之hadoop3.x工作笔记0152

准备windows无法关闭计算机,windows10系统提示正在准备windows请勿关闭计算机怎么办...

Windows下安装使用ffmpeg

怎么保护电脑文件夹？文件夹保护方法有哪些？

Linux和windows根目录（）的区别

快速指南：CC++程序员如何从 Windows 向 Linux 迁移

我的电脑(ACER 4750G)升级

荣耀magic2可以更新鸿蒙吗,华为EMUI不会更新了！直接升级鸿蒙系统，荣耀手机也不会放弃...

最新文章

树莓派4B安装官方发布64位 Raspberry Pi OS 系统

6. Manjaro下载与安装

在mysql官网上下载连接的jar包

uCOS-II源码下载及源码目录结构

Ubuntu Server 20.04 LTS下载及安装教程

鸿蒙电视rom,鸿蒙系统刷机包

码支付系统源码免挂版_免挂码支付系统源码,码支付系统源码,码支付源码全新版

Linux-完美解决linux系统镜像下载速度慢的问题

如何进入机械革命官网 下载驱动

CentOS 6各版本镜像合集下载

CentOS7 官网下载及各版本区别

JDK下载，安装与配置教程

树莓派官方系统（raspbian）安装及使用教程

JDK17下载与安装教程

【问题解决】关于Oracle官网下载JDK需要登录Oracle账户问题

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

听见丨锤子明年将有更多智能硬件还有T3

ESP8266开发之旅阿里云物联网平台篇② MQTT.FX客户端模拟调试 MQTT LED智能灯控制系统

如何进入机械革命官网下载驱动

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载