マイクロブログにおけるトピック出現量推移の高速な抽出

福山, 怜史; 若林, 啓; Satoshi, Fukuyama; Kei, Wakabayashi

WEKO3

インデックスツリー

RootNode

アイテム

マイクロブログにおけるトピック出現量推移の高速な抽出

https://ipsj.ixsq.nii.ac.jp/records/199757

名前 / ファイル	ライセンス	アクション
IPSJ-TOD1204004.pdf (981.4 kB)	Copyright (c) 2019 by the Information Processing Society of Japan
オープンアクセス

Item type

Trans(1)

公開日

2019-10-23

タイトル

マイクロブログにおけるトピック出現量推移の高速な抽出

タイトル

言語

タイトル

Fast Extraction of Time Series Variation for Topic Popularity in Microblogs

言語

jpn

キーワード

主題Scheme

Other

主題

[研究論文] マイクロブログ，トピック出現量，Biterm topic model，トピックモデル，ミニバッチ学習

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

筑波大学大学院図書館情報メディア研究科

著者所属

筑波大学図書館情報メディア系

著者所属(英)

Graduate School of Library, Information and Media Studies, University of Tsukuba

著者所属(英)

Faculty of Library, Information and Media Science, University of Tsukuba

著者名

福山, 怜史
若林, 啓

著者名(英)

Satoshi, Fukuyama
Kei, Wakabayashi

論文抄録

内容記述タイプ

Other

内容記述

近年，多くのメディアでは，関係するツイートの出現量が時間経過によって急上昇する話題を対象に情報発信が行われており，Twitterの話題の分析において話題の出現量の推移が注目されている．Twitterではハッシュタグが一部のツイートにしか与えられていないため，すべてのツイートに含まれる話題の推移を網羅的に観測することは容易ではない．この問題に対して，Biterm topic model（BTM）によってトピックを推定し，推定したトピックの出現量を利用する方法が有効である．しかし，Twitterではリアルタイムに膨大なツイートが更新されるため，トピックの推定やトピック出現量の計算において時間的な効率性が求められる．本研究では，ツイートデータを対象に，高速にトピックを学習し，各トピックの単位時間あたりの出現量の計算を効率的に行う手法を提案する．提案手法では，BTMに対してミニバッチ学習を適用し，トピック学習の高速化を図る．またトピック出現量の計算では，一部のデータを用いた近似的な計算を行うことによって，実質的な高速化を図る．実験では，提案手法が既存手法より汎化性能が優れつつ学習における処理時間が短縮できることを確認した．またトピック出現量を近似する方法について複数の方法を示し，近似による誤差の大きさと処理時間の短縮の観点から比較と検討を行った．

論文抄録(英)

内容記述タイプ

Other

内容記述

Recently, Twitter has attracted as a media that reflects popular topics in real time. Especially, many media provide the information of the topics that the number of tweets belonging to itself suddenly increases. However, because most tweets are not classified by tagging, it is hard to observe the time series variation of the topic from all tweets. In order to solve this problem, a method using the topic model, which is a method for estimating topics by documents, is proposed. However, since tweets are posted enormous tweets in real time, we need efficient methods for estimating topics and calculating the topic popularity. We propose the efficient method to estimate topics and calculate the time series variation for the topic popularity for tweets. In order to speed up the estimation of topics, we improve Biterm topic model, which is an effective method for short texts, to minibatch training. In addition, we propose a method to efficiently calculate the approximate topic popularity from partial data, Our experiments suggest that the proposed method has higher generalization ability and faster training time than baseline. Also, we discuss efficient and less lossy methods that calculating the topic popularity from several methods.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA11464847

書誌情報

情報処理学会論文誌データベース（TOD）

巻 12, 号 4, p. 15-26, 発行日 2019-10-23

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7799

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 21:32:59.497547

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

マイクロブログにおけるトピック出現量推移の高速な抽出

× 福山, 怜史

× 若林, 啓

× Satoshi, Fukuyama

× Kei, Wakabayashi

Versions

Share

Cite as

エクスポート