深層強化学習における擬似的な行動による中間フレームの有効活用

橋本, 大世; 鶴岡, 慶雅; Taisei, Hashimoto; Yoshimasa, Tsuruoka

WEKO3

インデックスツリー

RootNode

アイテム

深層強化学習における擬似的な行動による中間フレームの有効活用

https://ipsj.ixsq.nii.ac.jp/records/207655

名前 / ファイル	ライセンス	アクション
IPSJ-GPWS2020011.pdf (10.2 MB)	Copyright (c) 2020 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2020-11-06

タイトル

深層強化学習における擬似的な行動による中間フレームの有効活用

タイトル

言語

タイトル

Utilizing Skipped Frames in Deep Reinforcement Learning via Pseudo-Actions

言語

jpn

キーワード

主題Scheme

Other

主題

深層強化学習

キーワード

主題Scheme

Other

主題

Action Repeat

キーワード

主題Scheme

Other

主題

Skipped Frames

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

東京大学大学院情報理工学系研究科電子情報学専攻

著者所属

東京大学大学院情報理工学系研究科電子情報学専攻

著者所属(英)

Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo

著者所属(英)

Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo

著者名

橋本, 大世
鶴岡, 慶雅

著者名(英)

Taisei, Hashimoto
Yoshimasa, Tsuruoka

論文抄録

内容記述タイプ

Other

内容記述

深層強化学習の多くの設定ではエージェントが行動を取る際, 一度選んだ行動を何度か繰り返し, 次の行動決定時まで状態は観測しないことが一般的である. これはaction repeat またはframe skip と呼ばれる. 行動を繰り返すこの技法にはいくつかの利点があるが, 行動を繰り返す間のデータ(中間フレーム)は実質的に捨てられてしまう. 学習データ量はaction repeat の長さに反比例するため, これは学習のサンプル効率に悪影響となりうる. 本研究では, 擬似的な行動という概念を導入することでこの問題を軽減する, シンプルでありながら有効な手法を提案する. 提案手法の要点は, 擬似的な行動を考えることで, actionrepeat 間の遷移データを学習に利用できるようにすることである. 連続制御タスクにおける擬似的な行動は, 行動を決定する時刻をまたぐ行動系列の平均として得ることができる. 一方, 離散制御タスクにおける擬似的な行動は, 行動の埋め込み表現から計算することができる. この手法は, Q 関数の学習を伴う任意のモデルフリー強化学習手法と組み合わせることができ, 汎用的である. 実験では, OpenAI Gym の連続制御タスク, 離散制御タスクの両方で提案手法の有効性を検証した.

論文抄録(英)

内容記述タイプ

Other

内容記述

In many deep reinforcement learning settings, when an agent takes an action, it repeats the same action a predefined number of times without observing the states until the next action-decision point. This technique of action repetition has several merits in training the agent, but the data between action-decision points (i.e., intermediate frames) are, in effect, discarded. Since the amount of training data is inversely proportional to the interval of action repeats, they can have a negative impact on the sample efficiency of training. In this paper, we propose a simple but effective approach to alleviate to this problem by introducing the concept of pseudo-actions. The key idea of our method is making the transition between action-decision points usable as training data by considering pseudo-actions. Pseudo-actions for continuous control tasks are obtained as the average of the action sequence straddling an action-decision point. For discrete control tasks, pseudo-actions are computed from learned action embeddings. This method can be combined with any model-free reinforcement learning algorithm that involves the learning of Q-functions. We demonstrate the effectiveness of our approach on both continuous and discrete control tasks in OpenAI Gym.

書誌情報

ゲームプログラミングワークショップ2020論文集

巻 2020, p. 62-69, 発行日 2020-11-06

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 19:06:03.408801

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

深層強化学習における擬似的な行動による中間フレームの有効活用

× 橋本, 大世

× 鶴岡, 慶雅

× Taisei, Hashimoto

× Yoshimasa, Tsuruoka

Versions

Share

Cite as

エクスポート