日本語固有表現抽出におけるわかち書き問題の解決

浅原, 正幸; 松本, 裕治; Masayuki, Asahara; Yuji, Matsumoto

WEKO3

インデックスツリー

RootNode

アイテム

日本語固有表現抽出におけるわかち書き問題の解決

https://ipsj.ixsq.nii.ac.jp/records/10909

名前 / ファイル	ライセンス	アクション
IPSJ-JNL4505023.pdf (170.9 kB)	Copyright (c) 2004 by the Information Processing Society of Japan
オープンアクセス

Item type

Journal(1)

公開日

2004-05-15

タイトル

日本語固有表現抽出におけるわかち書き問題の解決

タイトル

言語

タイトル

A Word Unit Problem in Japanese Named Entity Extraction

言語

jpn

キーワード

主題Scheme

Other

主題

論文

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

その他タイトル

その他のタイトル

自然言語

著者所属

奈良先端科学技術大学院大学情報科学研究科

著者所属

奈良先端科学技術大学院大学情報科学研究科

著者所属(英)

Graduate School of Information Science, Nara Institute of Science and Technology

著者所属(英)

Graduate School of Information Science, Nara Institute of Science and Technology

著者名

浅原, 正幸松本, 裕治

著者名(英)

Masayuki, Asahara Yuji, Matsumoto

論文抄録

内容記述タイプ

Other

内容記述

一般的に日本語固有表現抽出で提案されている手法は形態素解析とチャンキングの組合せによる．形態素解析出力結果をそのままチャンカの入力にすると，形態素解析結果より小さい単位の固有表現を抽出することは困難である．そこで，文字単位でチャンキングを行う手法を提案する．まず，統計的形態素解析器で入力文を冗長的に解析を行う．次に，入力文を文字単位に分割し，文字，字種および形態素解析結果のn 次解までの品詞情報などを各文字に付与する．最後に，これらを素性として，サポートベクトルマシンに基づいたチャンカにより決定的に固有表現となる語の語境界を推定する．CRL 固有表現データを用いて評価実験（交差検定5-fold ）を行った結果，F 値0.87 という高精度の結果が得られた．

論文抄録(英)

内容記述タイプ

Other

内容記述

Named Entity (NE)extraction is a task in which proper nouns and numerical information are extracted from texts.A method of cascading morphological analysis and chunking is usually used for NE extraction in Japanese.However,such a method cannot extract smaller NE units than morphological analyzer outputs.To cope with the unit problem,we propose a character-based chunking method.Firstly,input sentences are redundantly analyzed by a statistical analyzer.Secondly,the input sentences are segmented into characters.The characters are annotated with the character types and POS tags of the top n-best answers that are given by the statistical morphological analyzer.Finally,we do chunking deterministically based on support vector machines.We apply our method to IREX NE task using CRL Named Entities data.The cross validation result of the F-value being 0.87 shows the effectiveness of the method.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00116647

書誌情報

情報処理学会論文誌

巻 45, 号 5, p. 1442-1450, 発行日 2004-05-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7764

戻る

views

See details

	Views

Versions

Ver.1

2025-01-23 02:33:51.648160

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

日本語固有表現抽出におけるわかち書き問題の解決

× 浅原, 正幸松本, 裕治

× Masayuki, Asahara Yuji, Matsumoto

Versions

Share

Cite as

エクスポート

インデックスリンク

インデックスツリー

アイテム

日本語固有表現抽出におけるわかち書き問題の解決

× 浅原, 正幸 松本, 裕治

× Masayuki, Asahara Yuji, Matsumoto

Versions

Share

Cite as

エクスポート

× 浅原, 正幸松本, 裕治