モデル間の予測誤差を利用した効率的な強化学習手法

橋本, 大世; 鶴岡, 慶雅; Taisei, Hashimoto; Yoshimasa Tsuruoka

WEKO3

インデックスツリー

RootNode

アイテム

モデル間の予測誤差を利用した効率的な強化学習手法

https://ipsj.ixsq.nii.ac.jp/records/199985

名前 / ファイル	ライセンス	アクション
IPSJ-GPWS2019022.pdf (1.8 MB)	Copyright (c) 2019 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2019-11-01

タイトル

モデル間の予測誤差を利用した効率的な強化学習手法

タイトル

言語

タイトル

An Eﬃcient Reinforcement Learning Method Using Prediction Errors Between Models

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

東京大学工学部電子情報工学科

著者所属

東京大学大学院情報理工学系研究科電子情報学専攻

著者所属(英)

Department of Information and Communication Engineering, The University of Tokyo

著者所属(英)

Department of Information and Communication Engineer-ing, Graduate School of Information Science and Technology, The University of Tokyo

著者名

橋本, 大世
鶴岡, 慶雅

著者名(英)

Taisei, Hashimoto
Yoshimasa Tsuruoka

論文抄録

内容記述タイプ

Other

内容記述

強化学習は囲碁などのボードゲームや Atari 2600 などのビデオゲームで多くの成功を収めているが,教師あり機械学習などと比べると未だに実社会での応用例は限られている. この理由の一つとして, サンプル効率の低さが挙げられる. また, 現実的なタスクでは報酬がスパースになりやすいが, 特にそのような環境では効率的な学習は難しい. 本論文では, 報酬がスパースな環境においても効率的に学習することのできる強化学習手法を提案する. 具体的には, モデルベース強化学習と内部報酬を組み合わせることで, 環境の探索および方策の学習を効率化する. また, 少量の画像から方策を学習するために, 画像をランダムにエンコードするという手法を考案する. 本稿では OpenAI Gym の MountainCar と Freeway において実験を行い, 画像を入力とする場合においても, 簡単なものであれば効率的な学習が可能であることを示した.

論文抄録(英)

内容記述タイプ

Other

内容記述

Reinforcement learning has been successful in board games such as Go and video games such as Atari 2600, but its application in the real world is still limited compared to supervised machine learning. One of the reasons is its low sample eﬃciency. Moreover, the rewards tend to be sparse in realistic tasks, and eﬃcient learning is diﬃcult especially in such an environment. In this study, we propose a reinforcement learning method that can learn eﬃciently even with sparse rewards. Speciﬁcally, we make the environment exploration and policy learning more eﬃcient by combining model-based reinforcement learning and intrin-sic rewards. Also, we have devised a method to learn a policy from a small number of image observations by randomly encoding them. In this paper, we conducted experiments with MountainCar and Freeway of OpenAI Gym and veriﬁed that eﬀective learning is possible also from raw images as long as they are simple.

書誌情報

ゲームプログラミングワークショップ2019論文集

巻 2019, p. 136-143, 発行日 2019-11-01

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 21:29:10.523846

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

モデル間の予測誤差を利用した効率的な強化学習手法

× 橋本, 大世

× 鶴岡, 慶雅

× Taisei, Hashimoto

× Yoshimasa Tsuruoka

Versions

Share

Cite as

エクスポート