WEKO3
-
RootNode
アイテム
XML Documents Searching Combining Structure and Keywords Similarities
https://ipsj.ixsq.nii.ac.jp/records/94309
https://ipsj.ixsq.nii.ac.jp/records/943097146274d-9346-4480-bbff-deaf608cb7b6
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
Copyright (c) 2013 by the Information Processing Society of Japan
|
|
オープンアクセス |
Item type | SIG Technical Reports(1) | |||||||
---|---|---|---|---|---|---|---|---|
公開日 | 2013-07-15 | |||||||
タイトル | ||||||||
タイトル | XML Documents Searching Combining Structure and Keywords Similarities | |||||||
タイトル | ||||||||
言語 | en | |||||||
タイトル | XML Documents Searching Combining Structure and Keywords Similarities | |||||||
言語 | ||||||||
言語 | eng | |||||||
キーワード | ||||||||
主題Scheme | Other | |||||||
主題 | 情報検索 | |||||||
資源タイプ | ||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_18gh | |||||||
資源タイプ | technical report | |||||||
著者所属 | ||||||||
Tokyo Institute of Technology/Chulalongkorn University | ||||||||
著者所属 | ||||||||
Tokyo Institute of Technology | ||||||||
著者所属 | ||||||||
Tokyo Institute of Technology | ||||||||
著者所属(英) | ||||||||
en | ||||||||
Tokyo Institute of Technology / Chulalongkorn University | ||||||||
著者所属(英) | ||||||||
en | ||||||||
Tokyo Institute of Technology | ||||||||
著者所属(英) | ||||||||
en | ||||||||
Tokyo Institute of Technology | ||||||||
著者名 |
Apichaya, Auvattanasombat
Yousuke, Watanabe
Haruo, Yokota
× Apichaya, Auvattanasombat Yousuke, Watanabe Haruo, Yokota
|
|||||||
著者名(英) |
Apichaya, Auvattanasombat
Yousuke, Watanabe
Haruo, Yokota
× Apichaya, Auvattanasombat Yousuke, Watanabe Haruo, Yokota
|
|||||||
論文抄録 | ||||||||
内容記述タイプ | Other | |||||||
内容記述 | In recent years, XML has been increasingly become an emerging standard and widely used in many applications. For example, office documents which are more and more popular used at this time, are also stored in multiple parts of XML archive formats. It is known that the structure and content of XML files play different roles depending on kind of documents. Therefore, achievement similarity search of an XML file should base on both structure and content. In previous work, LAX+ is an algorithm for reckoning a similarity value from structure and contents of XML files in the office documents. However, since LAX+ used exactly matching method between corresponding leaves, similar words in the leaf-nodes are considered as different. To solve the problem, we propose to combine LAX+ with keyword similarity in leaf-nodes. We use docx, xlsx and pptx file formats as experimental data set. The evaluation shows that our approach can be used to improve the precision and recall. | |||||||
論文抄録(英) | ||||||||
内容記述タイプ | Other | |||||||
内容記述 | In recent years, XML has been increasingly become an emerging standard and widely used in many applications. For example, office documents which are more and more popular used at this time, are also stored in multiple parts of XML archive formats. It is known that the structure and content of XML files play different roles depending on kind of documents. Therefore, achievement similarity search of an XML file should base on both structure and content. In previous work, LAX+ is an algorithm for reckoning a similarity value from structure and contents of XML files in the office documents. However, since LAX+ used exactly matching method between corresponding leaves, similar words in the leaf-nodes are considered as different. To solve the problem, we propose to combine LAX+ with keyword similarity in leaf-nodes. We use docx, xlsx and pptx file formats as experimental data set. The evaluation shows that our approach can be used to improve the precision and recall. | |||||||
書誌レコードID | ||||||||
収録物識別子タイプ | NCID | |||||||
収録物識別子 | AN10112482 | |||||||
書誌情報 |
研究報告データベースシステム(DBS) 巻 2013-DBS-157, 号 14, p. 1-6, 発行日 2013-07-15 |
|||||||
Notice | ||||||||
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. | ||||||||
出版者 | ||||||||
言語 | ja | |||||||
出版者 | 情報処理学会 |