admin管理员组

文章数量:1615372

目录

An Overview of Multi-agent Reinforcement Learning from Game Theoretical Perspective

博弈论视角下的多智能体强化学习综述

Abstract:

摘要:

1 Introduction 

1 引言

1.1 A Short History of RL

1.1 RL简史

1.2 2019: A Booming Year for MARL

1.2 2019:MARL蓬勃发展的一年

2 Single-Agent RL 

2 单智能体RL

2.2 Justification of Reward Maximisation

2.2 奖励最大化的合理性

2.3 Solving Markov Decision Processes

2.3 求解马尔可夫决策过程

2.3.1 Value-Based Methods 

2.3.1 基于价值的方法

2.3.2 Policy-Based Methods

2.3.2 基于策略的方法

3 Multi-Agent RL 

3 多智能体RL

3.1 Problem Formulation: Stochastic Game

3.1 问题表述:随机博弈

3.2 Solving Stochastic Games

3.2 解决随机游戏

3.2.1 Value-Based MARL Methods

3.2.1 基于价值的MARL方法

3.2.2 Policy-Based MARL Methods

3.2.2 基于策略的 MARL 方法

3.2.3 Solution Concept of the Nash Equilibrium

3.2.3 纳什均衡的解概念

An Overview of Multi-agent Reinforcement Learning from Game Theoretical Perspective

博弈论视角下的多智能体强化学习综述

https://arxiv/abs/2011.00583

Abstract:

摘要:

        Following the remarkable success of the AlphaGO series, 2019 was a booming year that witnessed significant advances in multi-agent reinforcement learning (MARL) techniques. MARL corresponds to the learning problem in a multi-agent system in which multiple agents learn simultaneously. It is an interdisciplinary domain with a long history that includes game theory, machine learning, stochastic control, psychology, and optimisation. Although MARL has achieved considerable empirical success in solving real-world games, there is a lack of a self-contained overview in the literature that elaborates the game theoretical foundations of modern MARL methods and summarises the recent advances. In fact, the majority of existing surveys are outdated and do not fully cover the recent developments since 2010. In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments in the research frontier.
        继AlphaGO系列取得巨大成功之后,2019年是蓬勃发展的一年,见证了多智能体强化学习(MARL)技术的重大进步。MARL对应于多智能体系统中的学习问题,其中多个智能体同时学习。这是一个具有悠久历史的跨学科领域,包括博弈论、机器学习、随机控制、心理学和优化。尽管 MARL 在解决现实世界博弈方面取得了相当大的实证成功,但文献中缺乏一个独立的概述来阐述现代 MARL 方法的博弈论基础并总结最近的进展。事实上,大多数现有调查已经过时,无法完全涵盖自2010年以来的最新发展。在这项工作中,我们提供了一本关于MARL的专著,涵盖了研究前沿的基础知识和最新发展。

        Our work is separated into two parts. From §1 to §4, we present the self-contained fundamental knowledge of MARL, including problem formulations, basic solutions, and existing challenges. Specifically, we present the MARL formulations through two representative frameworks, namely, stochastic games and extensive-form games, along with different variations of games that can be addressed. The goal of this part is to enable the readers, even those with minimal related background, to grasp the key ideas in MARL research. From §5 to §9, we present an overview of recent developments of MARL algorithms. Starting from new taxonomies for MARL methods, we conduct a survey of previous survey papers. In later sections, we highlight several modern topics in MARL research, including Q-function factorisation, multi-agent soft learning, networked multi-agent MDP, stochastic potential games, zero-sum continuous games, online MDP, turn-based stochastic games, policy space response oracle, approximation methods in general-sum games, and mean-field type learning in games with infinite agents. Within each topic, we select both the most fundamental and cutting-edge algorithms.
        我们的工作分为两部分。从 § 1 到 § 4,我们介绍了 MARL 的独立基础知识,包括问题表述、基本解决方案和现有挑战。具体来说,我们通过两个具有代表性的框架,即随机博弈和广义博弈,以及可以解决的不同博弈变体来介绍MARL公式。这部分的目的是使读者,即使是那些相关背景很少的读者,也能掌握 MARL 研究的关键思想。从 § 5 到 § 9,我们概述了 MARL 算法的最新发展。从MARL方法的新分类法开始,我们对以前的调查论文进行了调查。在后面的章节中,我们将重点介绍MARL研究中的几个现代主题,包括Q函数分解、多智能体软学习、网络多智能体MDP、随机势博弈、零和连续博弈、在线MDP、回合制随机博弈、策略空间响应预言机、广义博弈中的近似方法和无限智能体博弈中的均值场类型学习。在每个主题中,我们都会选择最基本和最前沿的算法。

 The goal of our monograph is to provide a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective. We expect this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.
我们的专著的目标是从博弈论的角度对当前最先进的 MARL 技术进行独立的评估。我们希望这项工作能够成为即将进入这个快速增长领域的新研究人员和希望获得全景视图并根据最新进展确定新方向的现有领域专家的垫脚石。

1 Introduction 

1 引言

    Machine learning can be considered as the process of converting data into knowledge (Shalev-Shwartz and Ben-David,, 2014). The input of a learning algorithm is training data (for example, images containing cats), and the output is some knowledge (for example, rules about how to detect cats in an image).

        This knowledge is usually represented as a computer program that can perform certain task(s) (for example, an automatic cat detector). In the past decade, considerable progress has been made by means of a special kind of machine learning technique: deep learning (LeCun et al.,, 2015). One of the critical embodiments of deep learning is different kinds of deep neural networks (DNNs) (Schmidhuber,, 2015) that can find disentangled representations (Bengio,, 2009) in high-dimensional data, which allows the software to train itself to perform new tasks rather than merely relying on the programmer for designing hand-crafted rules. An uncountable number of breakthroughs in real-world AI applications have been achieved through the usage of DNNs, with the domains of computer vision (Krizhevsky et al.,, 2012) and natural language processing (Brown et al.,, 2020; Devlin et al.,, 2018) being the greatest beneficiaries.
        
机器学习可以被认为将数据转化为知识的过程(Shalev-Shwartz and Ben-David,2014)。学习算法的输入是训练数据(例如,包含猫的图像),输出是一些知识(例如,有关如何在图像中检测猫的规则)

        这些知识通常表示为可以执行某些任务的计算机程序(例如,自动猫检测器)。在过去的十年中,通过一种特殊的机器学习技术取得了相当大的进展:深度学习(LeCun et al.,, 2015)。深度学习的关键实施例之一是不同类型的深度神经网络(DNN)(Schmidhuber,,2015),它可以在高维数据中找到解纠缠的表示(Bengio,,2009),这使得软件能够训练自己执行新任务,而不仅仅是依靠程序员来设计手工制作的规则。通过使用 DNN,在现实世界的 AI 应用中取得了无数的突破,包括计算机视觉(Krizhevsky 等人,2012 年)和自然语言处理(Brown 等人,2020 年;Devlin et al.,, 2018)是最大的受益者。

        In addition to feature recognition from existing data, modern AI applications often require computer programs to make decisions based on acquired knowledge (see Figure 1). To illustrate the key components of decision making, let us consider the real-world example of controlling a car to drive safely through an intersection. At each time step, a robot car can move by steering, accelerating and braking. The goal is to safely exit the intersection and reach the destination (with possible decisions of going straight or turning left/right into another lane). Therefore, in addition to being able to detect objects, such as traffic lights, lane markings, and other cars (by converting data to knowledge), we aim to find a steering policy that can control the car to make a sequence of manoeuvres to achieve the goal (making decisions based on the knowledge gained). In a decision-making setting such as this, two additional challenges arise:
        
除了从现有数据中识别特征外现代人工智能应用通常还需要计算机程序根据获得的知识做出决策(见图1)。为了说明决策的关键组成部分,让我们考虑控制汽车安全通过十字路口的真实示例。在每个时间步长,机器人汽车都可以通过转向、加速和制动来移动。目标是安全地离开十字路口并到达目的地(可能决定直行或左/右转进入另一条车道)。因此,除了能够检测交通信号灯、车道标记和其他汽车(通过将数据转换为知识)等物体外,我们还旨在找到一种可以控制汽车进行一系列机动以实现目标的转向策略(根据获得的知识做出决策)。在这样的决策环境中,还出现了两个额外的挑战:

Figure 1:Modern AI applications are being transformed from pure feature recognition (for example, detecting a cat in an image) to decision making (driving through a traffic intersection safely), where interaction among multiple agents inevitably occurs. As a result, each agent has to behave strategically. Furthermore, the problem becomes more challenging because current

本文标签: 视角智能博弈论