話者照合のための非線形帯域拡張法を用いたデータ拡張の検討

宮本, 春奈; 塩田, さやか; 貴家, 仁志

WEKO3

インデックスツリー

RootNode

アイテム

話者照合のための非線形帯域拡張法を用いたデータ拡張の検討

https://ipsj.ixsq.nii.ac.jp/records/197813

名前 / ファイル	ライセンス	アクション
IPSJ-MUS19123028.pdf (767.4 kB)	Copyright (c) 2019 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2019-06-15

タイトル

話者照合のための非線形帯域拡張法を用いたデータ拡張の検討

タイトル

言語

タイトル

Investigation on data augmentation using non-linear bandwidth extension for automatic speaker verification

言語

jpn

キーワード

主題Scheme

Other

主題

ポスターセッション1

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

現在，首都大学東京システムデザイン研究科情報科学域

著者所属

現在，首都大学東京システムデザイン研究科情報科学域

著者所属

現在，首都大学東京システムデザイン研究科情報科学域

著者所属(英)

Presently with Tokyo Metropolitan University, Faculty School of Systems Design, Department of Computer Science

著者所属(英)

Presently with Tokyo Metropolitan University, Faculty School of Systems Design, Department of Computer Science

著者所属(英)

Presently with Tokyo Metropolitan University, Faculty School of Systems Design, Department of Computer Science

著者名

宮本, 春奈
塩田, さやか
貴家, 仁志

論文抄録

内容記述タイプ

Other

内容記述

本論文では，x-vector に基づく話者照合システムにおいて帯域拡張法を用いて生成した広帯域音声によるデータ拡張に着目する．x-vector に基づく話者照合システムにおけるデータ拡張には，様々なノイズを加えるだけでなく，狭帯域音声をアップサンプリングしたデータ，またアップサンプリングしたデータと帯域拡張データとを混ぜ合わせて学習に用いるものがこれまでに報告されており，さらに DNN による帯域拡張を用いたデータ拡張についても報告されている．一方近年，帯域拡張法の一つとして非線形帯域拡張法 (N-BWE) が提案されている．N-BWE はモデル学習を行わず，計算量が非常に軽い手法として提案された．N-BWE は単純な非線形関数とフィルタのみで構成されているにも関わらず，話者照合の等価エラー率 (EER) と二乗平均平方根対数スペクトル歪みそれぞれにおいて高い性能を得られることが報告されている．そこで本論文では，x-vector に基づく話者照合システムを構築する際に，N-BWE を適用した音声を拡張データとして使用して実験を行った．実験結果より，アップサンプリングした音声と N-BWE で帯域拡張した音声を拡張データとして加えて学習を行った結果，アップサンプリングした音声のみを拡張データとして用いたシステムと比較して EER のエラー改善率は 24.5% を達成した．

論文抄録(英)

内容記述タイプ

Other

内容記述

This paper focuses on the performance of x-vector based automatic speaker verification (ASV) systems using bandwidth extension (BWE) methods for data augmentation. For the x-vector-based ASV system, data augmentation methods have been reported so far. These reports consider to use large amount of narrowband data. And, upsampling operation and BWE methods are applied to expand the training data. Additionally, deep neural network-based BWE method was used for data augmentation. On the other hand, non-linear bandwidth extension (N-BWE) method has been proposed as one of bandwidth extension methods. N-BWE was proposed as method with light-weight computational cost and non-learning. Although, N-BWE consists only of simple non-linear function and filters, it has been reported that the N-BWE method obtained low equal error rate and small values of root mean square-log spectral distance in some ASV systems. Comparing the performance of x-vector-based ASV systems, some conditions of data augmentation which includes the N-BWE method were carried out. From the experimental results, the method using both upsampled and N-BWE speech as additional training data achieved to 24.5% error reduction.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10438388

書誌情報

研究報告音楽情報科学（MUS）

巻 2019-MUS-123, 号 28, p. 1-5, 発行日 2019-06-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8752

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 22:11:55.851446

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

話者照合のための非線形帯域拡張法を用いたデータ拡張の検討

× 宮本, 春奈

× 塩田, さやか

× 貴家, 仁志

Versions

Share

Cite as

エクスポート