決定性有限オートマトンによる正規表現の貪欲な部分照合と部分式による捕獲

奥居, 哲; 増田, 拓也; 鈴木, 大郎; Satoshi, Okui; Takuya, Masuda; Taro, Suzuki

WEKO3

インデックスツリー

RootNode

アイテム

決定性有限オートマトンによる正規表現の貪欲な部分照合と部分式による捕獲

https://ipsj.ixsq.nii.ac.jp/records/176440

名前 / ファイル	ライセンス	アクション
IPSJ-JNL5712028.pdf (1.2 MB)	Copyright (c) 2016 by the Information Processing Society of Japan
オープンアクセス

Item type

Journal(1)

公開日

2016-12-15

タイトル

決定性有限オートマトンによる正規表現の貪欲な部分照合と部分式による捕獲

タイトル

言語

タイトル

A DFA-based Approach to Greedy Partial Regular Expression Matching with Subterm Addressing

言語

jpn

キーワード

主題Scheme

Other

主題

[一般論文(特選論文)] 正規表現，貪欲照合，決定性有限オートマトン，KMP文字列照合

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

中部大学

著者所属

会津大学

著者所属(英)

Chubu University

著者所属(英)

The University of Aizu

著者名

奥居, 哲
増田, 拓也
鈴木, 大郎

著者名(英)

Satoshi, Okui
Takuya, Masuda
Taro, Suzuki

論文抄録

内容記述タイプ

Other

内容記述

決定性有限オートマトン（DFA）を用いて正規表現の貪欲な部分照合を実現する手法について述べる．通常のDFAの状態がNFAの状態の部分集合からなるのに対し，本手法におけるDFAの状態は重複のないNFA状態の列とすでに終状態に達したか否かを表すフラグからなる．このDFAが貪欲な部分照合と部分式による捕獲を正しく実現することを証明する．バックトラックによる試行を行う多くの実装では照合に要する計算コストが最悪の場合，指数的に増大するのに対し，本手法の照合に要する最悪の時間計算量は，DFAを漸進的に構築する場合ではO(nm)，DFAをあらかじめ構築する場合ではO(m)に抑えられる．ここでnは正規表現の部分式の個数，mは照合文字列の長さである．一方，照合結果から部分式による捕獲を計算するのに要する時間計算量はO(mk)である．ここでkは捕獲に用いる部分式の個数である．本手法の試験実装に基づき照合に要する計算時間を計測した結果，Googleの正規表現ライブラリRE2と比較してより良好な結果が得られた．古典的な文字列照合手法であるKMPアルゴリズムあるいはAho-Corasickオートマトンとの関連にも言及する．

論文抄録(英)

内容記述タイプ

Other

内容記述

We present a DFA-based approach which enables greedy partial regular expression matching and subterm addressing without relying on any backtrack search. Each state of our DFA consists, compared to the standard construction regarding subsets of NFA-states as DFA-states, of an non-duplicating sequence of NFA-states and a flag indicating whether the final state has been found or not. The worst case computational cost for matching is O(nm) in the case of on-the-fly construction of DFA, and O(m) if DFA has been constructed in advance where n is the number of subexpressions in the regular expression and m the length of the input string, while the cost for subterm addressing requires O(mk) where k is the number of subexpression to be addressed. We prove the correctness of our algorithm with respect to a formalization of matching semantics. Our experimental implementation shows certain improvement to Google RE2 library for the test case they have given. We also mention a relation with a classical string matching algorithm given by Knuth, Morris and Pratt, or Aho-Corasick automata.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00116647

書誌情報

情報処理学会論文誌

巻 57, 号 12, p. 2769-2783, 発行日 2016-12-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7764

戻る

views

See details

	Views

Versions

Ver.1

2025-01-20 05:52:12.685512

Show All versions

Cite as

奥居, 哲, 増田, 拓也, 鈴木, 大郎, 2016: 2769–2783 p.

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

決定性有限オートマトンによる正規表現の貪欲な部分照合と部分式による捕獲

× 奥居, 哲

× 増田, 拓也

× 鈴木, 大郎

× Satoshi, Okui

× Takuya, Masuda

× Taro, Suzuki

Versions

Share

Cite as

エクスポート