セグメント構造を持つバイリンガルトピックモデル

田村, 晃裕; 隅田, 英一郎; Akihiro, Tamura; Eiichiro, Sumita

WEKO3

インデックスツリー

RootNode

アイテム

セグメント構造を持つバイリンガルトピックモデル

https://ipsj.ixsq.nii.ac.jp/records/185108

名前 / ファイル	ライセンス	アクション
IPSJ-JNL5812026.pdf (1.7 MB)	Copyright (c) 2017 by the Information Processing Society of Japan
オープンアクセス

Item type

Journal(1)

公開日

2017-12-15

タイトル

セグメント構造を持つバイリンガルトピックモデル

タイトル

言語

タイトル

Bilingual Segmented Topic Model

言語

jpn

キーワード

主題Scheme

Other

主題

[一般論文（特選論文）] 多言語トピックモデル，階層モデル，対訳対抽出

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

情報通信研究機構／現在，愛媛大学

著者所属

情報通信研究機構

著者所属(英)

National Institute of Information and Communications Technology / Presently with Ehime University

著者所属(英)

National Institute of Information and Communications Technology

著者名

田村, 晃裕
隅田, 英一郎

著者名(英)

Akihiro, Tamura
Eiichiro, Sumita

論文抄録

内容記述タイプ

Other

内容記述

本稿では，各文書を「文書-セグメント（たとえば，段落やセクション）-単語」の階層構造でモデル化する新たな多言語トピックモデル「Bilingual Segmented Topic Model（BiSTM）」を提案する．Bilingual Latent Dirichlet Allocation（BiLDA）などの従来の多言語トピックモデルは，対応関係がある文書のトピック分布を共有させることで，異言語の文書間の対応関係を反映したモデル化を行う．一方で，BiSTMは，文書間の対応関係に加えて，対応関係のあるセグメントのトピック分布も共有させることにより，異言語のセグメント間の対応関係も反映したモデル化を行う．また，本稿では，セグメントが与えられていない場合にも提案モデルを適用できるようにするため，Duら(2013)の教師なしトピック分割手法をBiSTMに導入し，潜在トピックとセグメント境界を同時に推定するモデルも提案する．日英および仏英の多言語コーパスを使った評価実験を通じて，提案モデルはBiLDAよりパープレキシティの観点で優れたモデルであることを示し，対訳対抽出の性能も改善できることを示す．

論文抄録(英)

内容記述タイプ

Other

内容記述

This paper proposes the bilingual segmented topic model (BiSTM), which hierarchically models documents by treating each document as a set of segments, e.g., sections. While previous bilingual topic models, such as bilingual latent Dirichlet allocation (BiLDA), consider only cross-lingual alignments between entire documents, the proposed model considers cross-lingual alignments between segments in addition to document-level alignments and assigns the same topic distribution to aligned segments. This paper also presents a method for simultaneously inferring latent topics and segmentation boundaries, incorporating unsupervised topic segmentation into BiSTM. Experiments using a Japanese-English and French-English Wikipedia corpus show that the proposed model significantly outperforms BiLDA in terms of perplexity and demonstrates improved performance in translation pair extraction.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00116647

書誌情報

情報処理学会論文誌

巻 58, 号 12, p. 2080-2092, 発行日 2017-12-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7764

戻る

views

See details

	Views

Versions

Ver.1

2025-01-20 03:05:16.645722

Show All versions

Cite as

田村, 晃裕, 隅田, 英一郎, 2017: 2080–2092 p.

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

セグメント構造を持つバイリンガルトピックモデル

× 田村, 晃裕

× 隅田, 英一郎

× Akihiro, Tamura

× Eiichiro, Sumita

Versions

Share

Cite as

エクスポート