XML Documents Searching Combining Structure and Keywords Similarities

Apichaya, Auvattanasombat; Yousuke, Watanabe; Haruo, Yokota; Apichaya, Auvattanasombat; Yousuke, Watanabe; Haruo, Yokota

WEKO3

インデックスツリー

RootNode

アイテム

XML Documents Searching Combining Structure and Keywords Similarities

https://ipsj.ixsq.nii.ac.jp/records/94309

名前 / ファイル	ライセンス	アクション
IPSJ-DBS13157014.pdf (799.2 kB)	Copyright (c) 2013 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2013-07-15

タイトル

XML Documents Searching Combining Structure and Keywords Similarities

タイトル

言語

タイトル

XML Documents Searching Combining Structure and Keywords Similarities

言語

eng

キーワード

主題Scheme

Other

主題

情報検索

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

Tokyo Institute of Technology／Chulalongkorn University

著者所属

Tokyo Institute of Technology

著者所属

Tokyo Institute of Technology

著者所属(英)

Tokyo Institute of Technology / Chulalongkorn University

著者所属(英)

Tokyo Institute of Technology

著者所属(英)

Tokyo Institute of Technology

著者名

Apichaya, Auvattanasombat Yousuke, Watanabe Haruo, Yokota

著者名(英)

Apichaya, Auvattanasombat Yousuke, Watanabe Haruo, Yokota

論文抄録

内容記述タイプ

Other

内容記述

In recent years, XML has been increasingly become an emerging standard and widely used in many applications. For example, office documents which are more and more popular used at this time, are also stored in multiple parts of XML archive formats. It is known that the structure and content of XML files play different roles depending on kind of documents. Therefore, achievement similarity search of an XML file should base on both structure and content. In previous work, LAX+ is an algorithm for reckoning a similarity value from structure and contents of XML files in the office documents. However, since LAX+ used exactly matching method between corresponding leaves, similar words in the leaf-nodes are considered as different. To solve the problem, we propose to combine LAX+ with keyword similarity in leaf-nodes. We use docx, xlsx and pptx file formats as experimental data set. The evaluation shows that our approach can be used to improve the precision and recall.

論文抄録(英)

内容記述タイプ

Other

内容記述

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10112482

書誌情報

研究報告データベースシステム（DBS）

巻 2013-DBS-157, 号 14, p. 1-6, 発行日 2013-07-15

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-21 14:40:24.664450

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

XML Documents Searching Combining Structure and Keywords Similarities

× Apichaya, Auvattanasombat Yousuke, Watanabe Haruo, Yokota

× Apichaya, Auvattanasombat Yousuke, Watanabe Haruo, Yokota

Versions

Share

Cite as

エクスポート