方策勾配法による探索制御の一考察

五十嵐, 治一; 森岡, 祐一; 山本, 一将; Harukazu, Igarashi; Yuichi, Morioka; Kazumasa, Yamamoto

WEKO3

インデックスツリー

RootNode

アイテム

方策勾配法による探索制御の一考察

https://ipsj.ixsq.nii.ac.jp/records/106503

名前 / ファイル	ライセンス	アクション
IPSJ-GPWS2014013.pdf (1.2 MB)	Copyright (c) 2014 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2014-10-31

タイトル

方策勾配法による探索制御の一考察

タイトル

言語

タイトル

Learning Search Control by Policy Gradient Algorithm

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

芝浦工業大学工学部情報工学科

著者所属

(株) コスモ・ウェブ

著者所属(英)

Shibaura Institute of Technology

著者所属(英)

Cosmoweb Co., Ltd.

著者名

五十嵐, 治一
森岡, 祐一
山本, 一将

著者名(英)

Harukazu, Igarashi
Yuichi, Morioka
Kazumasa, Yamamoto

論文抄録

内容記述タイプ

Other

内容記述

コンピュータ将棋において探索木の枝を成長させる際に，その枝までの探索経路に沿った指し手の累積的な選択確率の値を基に探索制御を行う方法を提案する．このときの指し手の選択には，将棋の指し手に関するヒューリスティクスを組み込んだシミュレーション方策を使用する．この際，枝成長を決定論的に行う場合と確率的に行う２つの場合を考えた．さらに，本手法ではこのシミュレーション方策中のパラメータを強化学習の一手法である方策勾配法により学習する．

論文抄録(英)

内容記述タイプ

Other

内容記述

This paper proposes a method based on the policy gradient learning algorithm for search control in computer shogi. In this method, whether every arc in a search tree should be extended is determined by the accumulated move-selection probability from the root node to the arc. Moves are selected by a simulation policy that includes heuristics for evaluating shogi moves. We consider two types of arc extension: deterministic and stochastic. In both cases, the parameters in the simulation policy can be learned by the policy gradient algorithm, which is a method of reinforcement learning.

書誌情報

ゲームプログラミングワークショップ2014論文集

巻 2014, p. 90-94, 発行日 2014-10-31

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-21 09:29:35.963037

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

方策勾配法による探索制御の一考察

× 五十嵐, 治一

× 森岡, 祐一

× 山本, 一将

× Harukazu, Igarashi

× Yuichi, Morioka

× Kazumasa, Yamamoto

Versions

Share

Cite as

エクスポート